Category

Plot


Usage

computeGCBias -b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit -l 200 --GCbiasFrequenciesFile freq.txt [options]


Manual

computeGCBias is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.

Required arguments

  • --bamfile bam file, -b bam file: Sorted BAM file.
  • --effectiveGenomeSize EFFECTIVEGENOMESIZE: The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. A table of values is available here.
  • --genome 2bit FILE, -g 2bit FILE: Genome in two bit format. Most genomes can be found here: https://hgdownload.cse.ucsc.edu/gbdb/. For example, if you want to obtain the 2bit index for human reference genome hg38, you can get it at https://hgdownload.cse.ucsc.edu/gbdb/hg38/hg38.2bit. You can also convert fasta files to 2bit index using the UCSC programm called faToTwoBit.
  • --GCbiasFrequenciesFile FILE, -freq FILE, -o FILE: Path to save the file containing the observed and expected read frequencies per %GC-content. This file is needed to run the correctGCBias tool. This is a text file.

Options

  • --fragmentLength FRAGMENTLENGTH, -l FRAGMENTLENGTH: Fragment length used for the sequencing. If paired-end reads are used, the fragment length is computed based from the bam file.
  • --sampleSize SAMPLESIZE: Number of sampling points to be considered. (Default: 50000000.0)
  • --extraSampling BED file: BED file containing genomic regions for which extra sampling is required because they are underrepresented in the genome.
  • --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to. This is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000.
  • --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.
  • --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
  • --biasPlot FILE NAME:  If given, a diagnostic image summarizing the GC-bias will be saved.
  • --plotFileFormat STR: image format type. If given, this option overrides the image format based on the plotFile ending. The available options are: "png", "eps", "pdf", "plotly" and "svg"
  • --regionSize INT: To plot the reads per %GC over a regionthe size of the region is required. By default, the bin size is set to 300 bases, which is close to the standard fragment size for Illumina machines. However, if the depth of sequencing is low, a larger bin size will be required, otherwise many bins will not overlap with any read (Default: 300)
  • --help, -h: show this help message and exit
  • --verbose, -v: Set to see processing messages.
  • --version: show program's version number and exit

 


Share your experience or ask a question