Category

Sam/Bam Manipulation


Usage

samtools ampliconclip -b BED file <input.bam> -o <output.bam>


Manual

Clips the ends of read alignments if they intersect with regions defined in a BED file. While this tool was originally written for clipping read alignment positions which correspond to amplicon primer locations it can also be used in other contexts. By default the reads are soft clipped and clip is only done from the 5' end. This command is available since samtools version 1.11.

Some things to be aware of. While ordering is not significant, adjustments to the left most mapping position (POS) will mean that coordinate sorted files will need resorting. In such cases the sorting order in the header is set to unknown. Clipping of reads results in template length (TLEN) being incorrect. This can be corrected by samtools fixmates. Any MD and NM aux tags will also be incorrect, which can be fixed by samtools calmd. By default MD and NM tags are removed though if the output is in CRAM format these tags will be automatically regenerated.

Required arguments

  • -b, --bed-file FILE: BED file of regions (e.g., amplicon primers) to be removed. BED file entries used are chrom, chromStart, chromEnd and, optionally, strand. There is a default tolerance of 5 bases (see --tolerance) when matching chromStart and chromEnd to alignments.
  • -o, --output FILE: Output file name (default stdout).
  • input.bam FILE: Input BAM file.

Options

  • -f, --stats-file FILE: Write stats to file name (default stderr).
  • -u: Output uncompressed data.
  • --soft-clip: Soft clip amplicon primers from reads (default).
  • --hard-clip: Hard clip amplicon primers from reads.
  • --both-ends: Clip on both 5' and 3' ends.
  • --strand: Use strand data from BED file to match read direction.
  • --clipped: Only output clipped reads.
  • --fail: Mark unclipped, mapped reads as QCFAIL.
  • --filter-len INT: Do not output reads INT size or shorter.
  • --fail-len INT: Mark as QCFAIL reads INT size or shorter.
  • --unmap-len INT: Unmap reads INT size or shorter, default 0.
  • --no-excluded: Do not write excluded reads (unmapped or QCFAIL).
  • --rejects-file FILE: File to write filtered reads.
  • --original: For clipped entries, add an OA tag with original data.
  • --keep-tag: For clipped entries, keep the old NM and MD tags.
  • --tolerance: Match region within this number of bases, default 5.
  • --no-PG: Do not add an @PG line.
  • --input-fmt-option OPT[=VAL]: Specify a single input file format option in the form of OPTION or OPTION=VALUE.
  • -O, --output-fmt FORMAT[,OPT[=VAL]]...: Specify output format (SAM, BAM, CRAM).
  • --output-fmt-option OPT[=VAL]: Specify a single output file format option in the form of OPTION or OPTION=VALUE.
  • --reference FILE: Reference sequence FASTA FILE [null].
  • -@, --threads INT: Number of additional threads to use [0].
  • --verbosity INT: Set level of verbosity.

About: Soft clips read alignments where they match BED file defined regions. Default clipping is only on the 5' end.

File formats this tool works with
BAM

Share your experience or ask a question