Genome Variant Analysis

java -jar GenomeAnalysisTK.jar
Function: Create plots to visualize base recalibration results
Usage: java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R myrefernce.fasta -BQSR myrecal.table -plots BQSR.pdf
java -jar GenomeAnalysisTK.jar
Function: Write out sequence read data (for filtering, merging, subsetting etc)
Usage: java -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fasta -I input1.bam -I input2.bam -o output.bam --read_filter MappingQualityZero // Prints the first 2000 reads in the BAM file java -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fasta -I input.bam -o output.bam -n 2000 // Downsamples BAM file to 25% java -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fasta -I input.bam -o output.bam -dfrac 0.25
java -jar GenomeAnalysisTK.jar
Function: Left-align indels in a variant callset
Usage: java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf --dontTrimAlleles
java -jar GenomeAnalysisTK.jar
Function: Count the number of bases in a set of reads
Usage: java -jar GenomeAnalysisTK.jar -R reference.fasta -T CountBases -I input.bam [-L input.intervals]
java -jar GenomeAnalysisTK.jar
Function: Compute the read error rate per position
Usage: java -jar GenomeAnalysisTK.jar -T ErrorRatePerCycle -R reference.fasta -I my_sequence_reads.bam -o error_rates.gatkreport.txt
java -jar GenomeAnalysisTK.jar
Function: Select a subset of variants from a larger callset
Usage: java -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant input.vcf --maxFilteredGenotypes 5 --minFilteredGenotypes 2 --maxFractionFilteredGenotypes 0.60 --minFractionFilteredGenotypes 0.10
java -jar GenomeAnalysisTK.jar
Function: Randomly select variant records according to specified options
Usage: java -jar GenomeAnalysisTK.jar -T ValidationSiteSelectorWalker -R reference.fasta -V:foo input1.vcf -V:bar input2.vcf --numValidationSites 200 -sf samples.txt -o output.vcf -sampleMode POLY_BASED_ON_GT -freqMode UNIFORM -selectType INDEL
java -jar GenomeAnalysisTK.jar
Function: Left-align indels in a variant callset
Usage: java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf --splitMultiallelics --dontTrimAlleles --keepOriginalAC
GEMINI autosomal_dominant
Function: Find variants meeting an autosomal dominant model.
Usage: gemini autosomal_dominant test.auto_dom.db --columns "chrom,start,end,gene"
java -jar GenomeAnalysisTK.jar
Function: Annotate variant calls with context information
Usage: java -jar GenomeAnalysisTK.jar -R reference.fasta -T VariantAnnotator -V input.vcf -o output.vcf --resource:foo resource.vcf --expression foo.AF --expression foo.FILTER
VarScan
Function: Call variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.
Usage: java -jar VarScan.jar copynumber [normal_pileup] [tumor_pileup] [output] OPTIONS
java -jar GenomeAnalysisTK.jar
Function: Calculates the GC content of the reference sequence for each interval
Usage: java -jar GenomeAnalysisTK.jar -T GCContentByInterval -R reference.fasta -o output.txt -L input.intervals
java -jar GenomeAnalysisTK.jar
Function: Validate a VCF file with an extra strict set of criteria
Usage: java -jar GenomeAnalysisTK.jar -T ValidateVariants -R reference.fasta -V input.vcf --dbsnp dbsnp.vcf
vt
Function: for comparison purposes, it's very useful to normalize the vcf output, especially for more complex graphs which can make large variant blocks that contain a lot of reference bases (Note: requires [vt](http://genome.sph.umich.edu/wiki/Vt)):
Usage: vt decompose_blocksub -a calls.vcf | vt normalize -r FASTA_FILE - > calls.clean.vcf
read_NVC.py
Function: This module is used to check the nucleotide composition bias. Due to random priming, certain patterns are over represented at the beginning (5’end) of reads. This bias could be easily examined by NVC (Nucleotide versus cycle) plot. NVC plot is generated by overlaying all reads together, then calculating nucleotide composition for each position of read (or each sequencing cycle). In ideal condition (genome is random and RNA-seq reads is randomly sampled from genome), we expect A%=C%=G%=T%=25% at each position of reads.
Usage: read_NVC.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output