Sam/Bam Manipulation

java -jar picard.jar
Function: Prints a SAM or BAM file to the screen.
Usage: java -jar picard.jar ViewSam
read_duplication.py
Function: Two strategies were used to determine reads duplication rate: * Sequence based: reads with identical sequence are regarded as duplicated reads. * Mapping based: reads mapped to the exactly same genomic location are regarded as duplicated reads. For splice reads, reads mapped to the same starting position and splice the same way are regarded as duplicated reads.
Usage: read_duplication.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output
java -jar picard.jar
Function: Collects Illumina lane metrics for the given BaseCalling analysis directory. This tool produces quality control metrics on cluster density for each lane of an Illumina flowcell. This tool takes Illumina TileMetrics data and places them into directories containing lane- and phasing-level metrics. In this context, phasing refers to the fraction of molecules that fall behind or jump ahead (prephasing) during a read cycle.
Usage: java -jar picard.jar CollectIlluminaLaneMetrics RUN_DIR=test_run OUTPUT_DIRECTORY=Lane_output_metrics OUTPUT_PREFIX=experiment1 READ_STRUCTURE=25T8B25T
java -jar picard.jar
Function: Manipulates interval lists. This tool offers multiple interval list file manipulation capabilities include sorting, merging, subtracting, padding, customizing, and other set-theoretic operations. If given one or more inputs, the default operation is to merge and sort them. Other options e.g. interval subtraction are controlled by the arguments. The tool lists intervals with respect to a reference sequence.Both interval_list and VCF files are accepted as input. The interval_list file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file that lists the sequence records against which the intervals are described. After the header, the file then contains records, one per line in text format with the following values tab-separated:
Usage: java -jar picard.jar -Sequence name (SN) -Start position (1-based)** -End position (1-based, end inclusive) -Strand (either + or -) -Interval name (ideally unique names for intervals)
java -jar picard.jar
Function: Collect mean quality by cycle.This tool generates a data table and chart of mean quality by cycle from a BAM file. It is intended to be used on a single lane or a read group's worth of data, but can be applied to merged BAMs if needed. This metric gives an overall snapshot of sequencing machine performance. For most types of sequencing data, the output is expected to show a slight reduction in overall base quality scores towards the end of each read. Spikes in quality within reads are not expected and may indicate that technical problems occurred during sequencing.
Usage: java -jar picard.jar MeanQualityByCycle I=input.bam O=mean_qual_by_cycle.txt CHART=mean_qual_by_cycle.pdf
bamtools
Function: Print coverage data for a single BAM file
Usage: bamtools coverage -in <BAM file>
java -jar picard.jar
Function: Collect jumping library metrics.
Usage: java -jar picard.jar CollectJumpingLibraryMetrics I=input.bam O=jumping_metrics.txt
samtools fasta
Function: Converts a BAM or CRAM into either FASTQ or FASTA format depending on the command invoked.
Usage: samtools fasta [options] in.bam
java -jar picard.jar
Function: Normalizes lines of sequence in a FASTA file to be of the same length.This tool takes any FASTA-formatted file and reformats the sequence to ensure that all of the sequence record lines are of the same length (with the exception of the last line). Although the default setting is 100 bases per line, a custom line_length can be specified by the user. In addition, record names can be truncated at the first instance of a whitespace character to ensure downstream compatibility.
Usage: java -jar picard.jar NormalizeFasta I=input_sequence.fasta O=normalized_sequence.fasta
java -jar picard.jar
Function: Creates a hash code based on the read groups (RG). This tool creates a hash code based on identifying information in the read groups (RG) of a ".BAM" or "SAM" file header. Addition or removal of RGs changes the hash code, enabling the user to quickly determine if changes have been made to the read group information.
Usage: java -jar picard.jar CalculateReadGroupChecksum I=input.bam
java -jar picard.jar
Function: Splits SNPs and INDELs into separate files. This tool reads in a VCF or BCF file and writes out the SNPs and INDELs it contains to separate files. The headers of the two output files will be identical and index files will be created for both outputs. If records other than SNPs or INDELs are present, set the STRICT option to "false", otherwise the tool will raise an exception and quit.
Usage: java -jar picard.jar SplitVcfs I=input.vcf SNP_OUTPUT=snp.vcf INDEL_OUTPUT=indel.vcf STRICT=false
java -jar picard.jar
Function: Merges multiple SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (http://www.htslib.org/doc/samtools.html). Note that to prevent errors in downstream processing, it is critical to identify/label read groups appropriately. If different samples contain identical read group IDs, this tool will avoid collisions by modifying the read group IDs to be unique. For more information about read groups, see the GATK Dictionary entry.
Usage: java -jar picard.jar MergeSamFiles I=input_1.bam I=input_2.bam O=merged_files.bam
java -jar picard.jar
Function: Fixes the NM, MD, and UQ tags in a SAM file. This tool takes in a SAM or BAM file (sorted by coordinate) and calculates the NM, MD, and UQ tags by comparing with the reference.This may be needed when MergeBamAlignment was run with SORT_ORDER different from 'coordinate' and thus could not fix these tags then.
Usage: java -jar picard.jar SetNmMDAndUqTags I=sorted.bam O=fixed.bam \
java -jar picard.jar
Function: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. This tool collects metrics about the percentages of reads that pass base- and mapping- quality filters as well as coverage (read-depth) levels. Both minimum base- and mapping-quality values as well as the maximum read depths (coverage cap) are user defined. This extends CollectWgsMetrics by including metrics related only to siteswith non-zero (>0) coverage.
Usage: java -jar picard.jar CollectWgsMetricsWithNonZeroCoverage I=input.bam O=collect_wgs_metrics.txt CHART=collect_wgs_metrics.pdf R=reference_sequence.fasta
samtools idxstats
Function: Retrieve and print stats in the index file corresponding to the input file. Before calling idxstats, the input BAM file must be indexed by samtools index.
Usage: samtools idxstats aln.sorted.bam