Category

Genomic Interval Manipulation


Usage

bedtools cluster [OPTIONS] -i <BED/GFF/VCF>


Manual

This tool is part of the bedtools suite and it's also known as clusterBed.

Similar to bedtools merge (aka mergeBed), bedtools cluster report each set of overlapping or book-ended features in an interval file.

In contrast to bedtools merge, cluster does not flatten the cluster of intervals into a new meta-interval; instead, it assigns an unique cluster ID to each record in each cluster (a new column of the cluster IDs will be appended to the end of the output).

This tool may be useful for having fine control over how sets of overlapping intervals in a single interval file are combined.

bedtools cluster requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files).

Required arguments

  • -i <bed/gff/vcf>: Path to the input file containing intervals to be clustered.

Options

  • -s: Force strandedness. That is, only merge features that are the same strand. By default, merging is done without respect to strand.
  • -d INTEGER: Maximum distance between features allowed for features to be merged. That is, overlapping & book-ended features are merged. By default, 0.

Example

By default, bedtools cluster collects overlapping (by at least 1 bp) and/or bookended intervals into distinct clusters. In the example below, the 4th column is the cluster ID.

$ cat A.bed
chr1	100	200
chr1	180	250
chr1	250	500
chr1	501	1000

$ bedtools cluster -i A.bed
chr1	100	200	1
chr1	180	250	1
chr1	250	500	1
chr1	501	1000	2

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question