Skip to content

Commit

Permalink
Add forcecells flag to ATAC, CITE, GEX, and MULTI pipelines
Browse files Browse the repository at this point in the history
  • Loading branch information
chenv3 committed Aug 5, 2024
1 parent 30b44ae commit 877dc34
Show file tree
Hide file tree
Showing 9 changed files with 357 additions and 34 deletions.
76 changes: 69 additions & 7 deletions cell-seek
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ def parsed_arguments(name, description):
[--aggregate {{mapped, none}}][--libraries LIBRARIES] \\
[--features FEATURES] [--cmo-reference CMOREFERENCE] \\
[--cmo-sample CMOSAMPLE] [--exclude-introns] [--filter FILTER] \\
[--create-bam] [--rename RENAMEFILE] \\
[--create-bam] [--rename RENAMEFILE] [--forcecells FORCECELLS]\\
--input INPUT [INPUT ...] \\
--output OUTPUT \\
--pipeline {{gex, ...}} \\
Expand Down Expand Up @@ -372,9 +372,11 @@ def parsed_arguments(name, description):
from higher depth samples until each library type has an
equal number of reads per cell that are confidently mapped.
None means to not normalize at all. If this flag is not
used then aggregate will not be run. To run Cell Ranger
aggregate, please select one of the following options:
mapped, none.
used then aggregate will not be run. Aggregate analysis
is generally not needed, but it can be used to generate a
Loupe Browser file for interactive exploration of the data.
To run Cell Ranger aggregate, please select one of the
following options: mapped, none.
Example: --aggregate mapped
--libraries LIBRARIES
Libraries file. A CSV file containing information about
Expand Down Expand Up @@ -556,16 +558,67 @@ def parsed_arguments(name, description):
Here is an example rename.csv file:
FASTQ,Name
original_name1,new_name1
original_name2,new_name1
original_name3,new_name2
original_name4,new_name3
original_name2,new_name2
original_name3,new_name3
original_name3-2,new_name3
original_name4,original_name4
where:
• FASTQ: The name that is used in the FASTQ file
• Name: Unique sample ID that is the sample name used for
Cell Ranger count.
In this example, new_name3 has FASTQ files with two different
names. With this input, both sets of FASTQ files will be used
when processing the sample as new_name3. original_name4 will not
be renamed. Any FASTQ file that does not have the name
original_name1, original_name2, original_name3, or original_name4
will not be run.
Example: --rename rename.csv
--forcecells FORCECELLS
Force cells file. A CSV file containing the name of the sample
(the Cell Ranger outputted name) and the number of cells to
force the sample to. This flag is applicable when using the GEX,
CITE, MULTI, and ATAC pipelines. It will generally be used if
the first analysis run appears to do a poor job at estimating
the number of cells, and a re-run is needed to adjust the number
of cells in the sample.
This file can created in two different formats. The first one
can be used for the GEX, CITE, MULTI, and ATAC pipelines. It
will contain the name of the sample and the number of cells
to be forced to.
Here is an example forcecells.csv file:
Sample,Cells
Sample1,3000
Sample2,5000
where:
• Sample: The sample name used as the Cell Ranger output
• Cells: The number of cells the sample should be forced to
In this example, Sample1 and Sample2 will be run while being forced
to have 3000 and 5000 cells respectively. Any other samples that
are processed will be run without using the force cells flag and
will use the default cell calling algorithm.
The second format is only compatible with the MULTI pipeline and
would be used when hashtag multiplexing is used and the number of
cells needs to be forced for a specific hashtagged sample.
Here is an example forcecells.csv file:
Name,Sample,Cells
Library1,HTO_1,3000
Library1,HTO_2,5000
where:
• Library: The name of the library that is provided as to Cell
Ranger when running multi analysis. This should match the
name that is given in the libraries.csv file.
• Sample: The sample ID used for the associated hashtag. This
will have to match the value used in the CMO sample file or
the CMO reference file that is provided as input. If only a
CMO reference file is provided, the pipeline default assigns
each hashtag with the IDs of HTO_1, HTO_2, etc.
• Cells: The number of cells the sample should be forced to
In this example, the hashtags HTO_1 and HTO_2 in Library 1 will
be run while being forced to 3000 and 5000 cells respectively.
Any other libraries or samples that are processed will be run
without using the force cells flag.
{3}{4}Orchestration options:{5}
--mode {{slurm,local}}
Expand Down Expand Up @@ -840,6 +893,15 @@ def parsed_arguments(name, description):
help = argparse.SUPPRESS
)

# Number of cells to force samples to when running Cell Ranger analysis
subparser_run.add_argument(
'--forcecells',
# Check if the file exists and if it is readable
type = lambda file: permissions(parser, file, os.R_OK),
required = False,
help = argparse.SUPPRESS
)

# Orchestration Options
# Execution Method, run locally
# on a compute node or submit to
Expand Down
123 changes: 117 additions & 6 deletions docs/usage/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Each of the following arguments are optional, and do not need to be provided.
> **Cell Ranger aggregate normalization.**
> *type: string*
>
> This option defines the normalization mode that should be used. Mapped is what Cell Ranger would run by default, which subsamples reads from higher depth samples until each library type has an equal number of reads per cell that are confidently mapped. None means to not normalize at all. If this flag is not used then aggregate will not be run. To run Cell Ranger aggregate, please select one of the following options: mapped, none.
> This option defines the normalization mode that should be used. Mapped is what Cell Ranger would run by default, which subsamples reads from higher depth samples until each library type has an equal number of reads per cell that are confidently mapped. None means to not normalize at all. If this flag is not used then aggregate will not be run. Aggregate analysis is generally not needed, but it can be used to generate a Loupe Browser file for interactive exploration of the data. To run Cell Ranger aggregate, please select one of the following options: mapped, none.
>
> ***Example:*** `--aggregate mapped`
Expand Down Expand Up @@ -164,7 +164,7 @@ Each of the following arguments are optional, and do not need to be provided.
---
`--rename RENAME`
> **Rename sample file.**
> **Rename sample file.**
> *type: file*
>
> Rename sample file. A CSV file containing the name of the FASTQ file and the new name of the sample. Only the samples listed in the CSV files will be run.
Expand All @@ -183,11 +183,35 @@ Each of the following arguments are optional, and do not need to be provided.
>
> - *FASTQ:* The name that is used in the FASTQ file
> - *Name:* Unique sample ID that is the sample name used for Cell Ranger count.
>
>
> In this example, new_name3 has FASTQ files with two different names. With this input, both sets of FASTQ files will be used when processing the sample as new_name3. original_name4 will not be renamed. Any FASTQ file that does not have the name original_name1, original_name2, original_name3, or original_name4 will not be run.
>
> ***Example:*** `--rename rename.csv`
---
`--forcecells FORCECELLS`
> **Force cells file.**
> *type: file*
>
> Force cells file. A CSV file containing the name of the sample (the Cell Ranger outputted name) and the number of cells to force the sample to. It will generally be used if the first analysis run appears to do a poor job at estimating the number of cells, and a re-run is needed to adjust the number of cells in the sample.
>
> *Here is an example forcecells.csv file:*
> ```
> Sample,Cells
> Sample1,3000
> Sample2,5000
> ```
>
> *Where:*
>
> - *Sample:* The sample name used as the Cell Ranger output
> - *Cells:* The number of cells the sample should be forced to
>
> In this example, Sample1 and Sample2 will be run while being forced to have 3000 and 5000 cells respectively. Any other samples that are processed will be run without using the force cells flag and will use the default cell calling algorithm.
>
> ***Example:*** `--forcecells forcecells.csv`
### 2.2 VDJ
#### 2.2.1 Required Arguments
Expand Down Expand Up @@ -245,7 +269,7 @@ Each of the following arguments are required. Failure to provide a required argu
#### 2.2.2 Analysis Options
`--rename RENAME`
> **Rename sample file.**
> **Rename sample file.**
> *type: file*
>
> Rename sample file. A CSV file containing the name of the FASTQ file and the new name of the sample. Only the samples listed in the CSV files will be run.
Expand Down Expand Up @@ -403,6 +427,29 @@ Each of the following arguments are required. Failure to provide a required argu
>
> ***Example:*** `--create-bam`
---
`--forcecells FORCECELLS`
> **Force cells file.**
> *type: file*
>
> Force cells file. A CSV file containing the name of the sample (the Cell Ranger outputted name) and the number of cells to force the sample to. It will generally be used if the first analysis run appears to do a poor job at estimating the number of cells, and a re-run is needed to adjust the number of cells in the sample.
>
> *Here is an example forcecells.csv file:*
> ```
> Sample,Cells
> Sample1,3000
> Sample2,5000
> ```
>
> *Where:*
>
> - *Sample:* The sample name used as the Cell Ranger output
> - *Cells:* The number of cells the sample should be forced to
>
> In this example, Sample1 and Sample2 will be run while being forced to have 3000 and 5000 cells respectively. Any other samples that are processed will be run without using the force cells flag and will use the default cell calling algorithm.
>
> ***Example:*** `--forcecells forcecells.csv`
### 2.4 MULTI
There are multiple different combinations of library types that may result in the use of Cell Ranger `multi` analysis. Any combination that combines GEX and VDJ data for cell calls, or the use of HTO with the Cell Ranger hashtag caller would need `multi` analysis.
Expand Down Expand Up @@ -540,7 +587,7 @@ Each of the following arguments are optional, and do not need to be provided.
> - *id:* Unique ID for this feature. Must not contain whitespace, quote or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome.
> - *name:* Human-readable name for this feature. Must not contain whitespace.
> - *sequence:* Nucleotide barcode sequence associated with this hashtag
> - *feature_type: Type of the feature. This should always be multiplexing capture.
> - *feature_type:* Type of the feature. This should always be multiplexing capture.
> - *read:* Specifies which RNA sequencing read contains the Feature Barcode sequence. Must be R1 or R2, but in most cases R2 is the correct read.
> - *pattern:* Specifies how to extract the sequence of the feature barcode from the read.
>
Expand Down Expand Up @@ -586,6 +633,47 @@ Each of the following arguments are optional, and do not need to be provided.
>
> ***Example:*** `--create-bam`
---
`--forcecells FORCECELLS`
> **Force cells file.**
> *type: file*
>
> Force cells file. A CSV file containing the name of the sample (the Cell Ranger outputted name) and the number of cells to force the sample to. It will generally be used if the first analysis run appears to do a poor job at estimating the number of cells, and a re-run is needed to adjust the number of cells in the sample.
>
> This file can created in two different formats. The first one will contain the name of the sample and the number of cells to be forced to.
>
> *Here is an example forcecells.csv file:*
> ```
> Sample,Cells
> Sample1,3000
> Sample2,5000
> ```
>
> *Where:*
>
> - *Sample:* The sample name used as the Cell Ranger output
> - *Cells:* The number of cells the sample should be forced to
>
> In this example, Sample1 and Sample2 will be run while being forced to have 3000 and 5000 cells respectively. Any other samples that are processed will be run without using the force cells flag and will use the default cell calling algorithm.
>
> The second format is only compatible when hashtag multiplexing is used and the number of cells needs to be forced for a specific hashtagged sample.
>
> *Here is an example forcecells.csv file:*
> ```
> Name,Sample,Cells
> Library1,Sample1,3000
> Library1,Sample2,5000
> ```
>
> *Where:*
>
> - *Library:* The name of the library that is provided as to Cell Ranger when running multi analysis. This should match the name that is given in the libraries.csv file.
> - *Sample:* The sample ID used for the associated hashtag. This will have to match the value used in the CMO sample file or the CMO reference file that is provided as input. If only a CMO reference file is provided, the pipeline default assigns each hashtag with the IDs of HTO_1, HTO_2, etc.
> - *Cells:* The number of cells the sample should be forced to
>
> In this example, the hashtags HTO_1 and HTO_2 in Library 1 will be run while being forced to 3000 and 5000 cells respectively. Any other libraries or samples that are processed will be run without using the force cells flag.
>
> ***Example:*** `--forcecells forcecells.csv`
### 2.5 ATAC
Expand Down Expand Up @@ -634,7 +722,7 @@ Each of the following arguments are required. Failure to provide a required argu
#### 2.5.2 Analysis Options
`--rename RENAME`
> **Rename sample file.**
> *type: file*
>
Expand All @@ -659,6 +747,29 @@ Each of the following arguments are required. Failure to provide a required argu
>
> ***Example:*** `--rename rename.csv`
---
`--forcecells FORCECELLS`
> **Force cells file.**
> *type: file*
>
> Force cells file. A CSV file containing the name of the sample (the Cell Ranger outputted name) and the number of cells to force the sample to. It will generally be used if the first analysis run appears to do a poor job at estimating the number of cells, and a re-run is needed to adjust the number of cells in the sample.
>
> *Here is an example forcecells.csv file:*
> ```
> Sample,Cells
> Sample1,3000
> Sample2,5000
> ```
>
> *Where:*
>
> - *Sample:* The sample name used as the Cell Ranger output
> - *Cells:* The number of cells the sample should be forced to
>
> In this example, Sample1 and Sample2 will be run while being forced to have 3000 and 5000 cells respectively. Any other samples that are processed will be run without using the force cells flag and will use the default cell calling algorithm.
>
> ***Example:*** `--forcecells forcecells.csv`
### 2.6 Multiome
#### 2.6.1 Required Arguments
Expand Down
Loading

0 comments on commit 877dc34

Please sign in to comment.