Reference Code backup Executable files
Calculate Jaccard statistic between two feature files. Jaccard is the length of the intersection over the union. Values range from 0 (no intersection) to 1 (self intersection).
bedtools jaccard [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
This tool is part of the bedtools
suite.
bedtools jaccard
calculates the Jaccard statistic between two sets of genomic intervals, which is a measure of similarity between the sets based on the intersection and union of the intervals. The Jaccard similarity coefficient, often referred to as the Jaccard index, is a way to quantify the degree of overlap or similarity between two sets. In genomics, it's commonly used to assess how much two sets of genomic intervals overlap with each other.
-f 0.90
and -r is used, this requires that B overlap 90% of A and A also overlaps 90% of B.-f 0.90
and -F 0.10
, this requires that either 90% of A is covered OR 10% of B is covered. Without -e, both fractions would have to be satisfied.By default, bedtools jaccard
reports the length of the intersection, the length of the union (minus the intersection), the final Jaccard statistic reflecting the similarity of the two sets, as well as the number of intersections.
$ cat a.bed chr1 10 20 chr1 30 40 $ cat b.bed chr1 15 20 $ bedtools jaccard -a a.bed -b b.bed intersection union jaccard n_intersections 5 20 0.25 1
One can also control which intersections are included in the statistic by requiring a certain fraction of overlap with respect to the features in A (via the -f parameter) or also by requiring that the fraction of overlap is reciprocal (-r) in A and B.
$ cat a.bed chr1 10 20 chr1 30 40 $ cat b.bed chr1 15 20
Require 10% overlap with respect to the intervals in A:
$ bedtools jaccard-a a.bed
-b b.bed
-f 0.1
intersection union jaccard n_intersections 5 20 0.25 1
Require 60% overlap with respect to the intervals in A:
$ bedtools jaccard-a a.bed
-b b.bed
-f 0.6
intersection union jaccard n_intersections 0 25 0.25 0