Category

Gene Expression Analysis


Usage

docker run <bind_mounts> cibersortx/hires [options] --mixture <file> --sigmatrix <file>


Manual

Brief introduction

CIBERSORTxHiRes imputes sample-level gene expression variation of distinct cell types from a collection of bulk tissue transcriptomes. Unlike CIBERSORTxGEP, the output is an expression matrix for each cell type rather than a single transcriptome profile. CIBERSORTxHiRes is useful for exploring cell type expression variation without prior knowledge of biological or functional groupings (e.g. relating cell type specific gene expression to survival). 

docker or singularity is required to run this tool. You can run

docker pull cibersortx/hires

to obtain a copy of this tool. You also need a token that you will provide every time you run the CIBERSORTx executables. You can obtain the token from the CIBERSORTx website.

Required arguments

  • --username string: Email used for login to cibersortx.stanford.edu
  • --token string: Token associated with current IP address (generated on website)
  • --mixture file_name: Gene expression profile (GEP) matrix for the mixtures (bulk RNA-seq samples). Formatting requirements:
    • Tab-delimited tabular input format (.txt or .tsv) with no double quotations and no missing entries.
    • Genes in column 1; Mixture labels (sample names) in row 1
    • Given the significant difference between counts (e.g., CPM) and gene length-normalized expression data (e.g., TPM) we recommend that the signature matrix and mixture files be represented in the same normalization space whenever possible.
    • Data should be in non-log space. Note: if maximum expression value is less than 50; CIBERSORTx will assume that data are in log space, and will anti-log all expression values by $2^x$.
  • --sigmatrix file_name: Signature matrix. You can use empirically determined signature matrix, you can also use CIBERSORTx to generate one for you if you have single-cell RNA-seq reference available.

Options

  • --classes file_name: Cell type groupings
  • --cibresults file_name: Previous CIBERSORTx cell fractions [default: run CIBERSORT]
  • --filtered file_name: Filtered GEPs from CIBERSORTxGEP
  • --label char: Sample label
  • --rmbatchBmode bool: Run B-mode batch correction [default: FALSE]
  • --rmbatchSmode bool: Run S-mode batch correction [default: FALSE]
  • --sourceGEPs file_name: Signature matrix GEPs for batch correction [default: sigmatrix]
  • --groundtruth file_name: Ground truth GEPs [same labels as classes] [default: none]
  • --threads int: Number of parallel processes [default: $\text{No. cores} - 1$]
  • --QN bool: Run quantile normalization [default: FALSE]
  • --variableonly bool: Restrict output to genes with variable expression [default: TRUE]
  • --nsampling int: Number of subsamples for most important NNLS calls [default: 100]
  • --nsampling2 int: Number of subsamples for general NNLS calls [default: 10]
  • --degclasses file_name: Run on two classes, specified by 1, 2, 0=skip [default: none]
  • --window int: Window size for deconvolution [default: $\text{No. of cell types} \times 4$]
  • --heatmap bool: Write heat map of cell type-specific GEPs to disk [default: TRUE]
  • --cluster bool: Group genes in heat map by hierarchical clustering [default: FALSE]
  • --subsetgenes file_name: Run analysis on a specific gene list [default: none]
  • --useadjustedmixtures bool: If doing B-mode batch correction, use adjusted mixtures for GEP imputation [default: FALSE]
  • --variableonly bool: Write only genes that show variation in expression to the final output [default: FALSE]

Outputs

The main result of CIBERSORTxHiRes is a set of expression matrix .txt files and heatmaps for each individual cell type, showing the cell-type specific expression of individual genes at the sample level.

The name of each file contains the name of the cell type, followed by the window size used for the job (e.g. CIBERSORTxHiRes_job1_Mastcells_Window40.txt).

The "1" values in the expression matrix txt files are genes with insufficient evidence of expression (these genes are either not expressed or have inadequate statistical power to be imputed). The NA values are genes that have inadequate statistical power to be imputed.

If you have provided a gene subset list via --subsetgenes, these files will have all the genes in the list that are found in the original mixture file given as input. If some genes are still missing, this could be due to different annotations or gene symbols between the gene subset list and the mixture file.

CIBERSORTxHiRes runs both CIBERSORTxFractions and CIBERSORTxGEP, and a set of output files is generated from each analysis.

Examples

Group Level GEPs - FL (Fig. 3b-f)

This examples imputes cell type specific gene expression profiles from bulk follicular lymphoma samples profiled on microarray, using the signature matrix LM22 collapsed to 4 major cell types. In addition the results are compared to ground truth reference profiles obtained from FACS-sorted cell subsets.

docker run -v absolute/path/to/input/dir:/src/data -v absolute/path/to/output/dir:/src/outdir cibersortx/hires \
    --username email_address_registered_on_CIBERSORTx_website \
    --token token_obtained_from_CIBERSORTx_website \
    --mixture Fig4a-arrays-SimulatedMixtures.MAS5.txt \
    --sigmatrix Fig4a-LM4.txt \
    --classes Fig4a-LM4-mergedclasses.txt \
    --window 20 --QN TRUE
Group Level GEPs - NSCLC (Fig. 3g)

This examples imputes cell type specific gene expression profiles from bulk NSCLC samples profiled by RNA-Seq and compares the results to ground truth reference profiles obtained from FACS-sorted cell subsets. 

docker run -v absolute/path/to/input/dir:/src/data -v absolute/path/to/output/dir:/src/outdir cibersortx/hires \
    --username email_address_registered_on_CIBERSORTx_website \
    --token token_obtained_from_CIBERSORTx_website \
    --mixture SuppFig11-DLBCL_CHOP_Lenz-arrays-bulktumors.MAS5.txt \
    --sigmatrix LM22.txt --classes SuppFig11-LM22_10_merged_classes.txt \
    --subsetgenes SuppFig11-DLBCL-GCBABC-genes --QN TRUE

File formats this tool works with
TSVTXT

Share your experience or ask a question