DRAGEN functional equivalence germline SNP and indel discovery in human whole genome sequencing data.
The Whole Genome Germline Single Sample pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and indel discovery in human whole genome sequencing data. When the pipeline runs in the DRAGEN-GATK mode, it produces functionally equivalent outputs to the DRAGEN pipeline.
This workflow is maintained by the Broad Institute and is written in Workflow Description Language (WDL). Further documentation can be found here.
Workflow Inputs
The workflow requires sample and reference information. The user may decide whether or not to run the pipeline in DRAGEN functional equivalence mode by setting the value of the dragen_functional_equivalence_mode
input.
The Broad Institute provides various test inputs hosted in GCP that can be used to run the pipeline.
Input | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample_and_unmapped_bams | Information and files associated with the sample.
| ||||||||||||||||||||||
references | Data associated with the reference genome.
| ||||||||||||||||||||||
dragmap_reference | Files used by the DRAGMAP aligner.
| ||||||||||||||||||||||
scatter_settings | Information for variant calling scatter settings.
| ||||||||||||||||||||||
papi_settings | Information regarding the number of preemptions allowed.
| ||||||||||||||||||||||
wgs_coverage_interval_list | Interval list for the CollectWgsMetrics tool. |
Workflow Outputs
The pipeline outputs variant calls, aligned reads, and various metrics files.
Output | Description | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UnmappedBamToAlignedBam | Quality control metrics and files output during alignment.
| ||||||||||||||||||||||||||||||||||||||||
AggregatedBamQC | Outputs from aggregating the aligned recalibrated BAM and calculating quality control metrics.
| ||||||||||||||||||||||||||||||||||||||||
CollectWgsMetrics | WGS metrics collected using stringent thresholds.
| ||||||||||||||||||||||||||||||||||||||||
CollectRawWgsMetrics | WGS metrics collected using less stringent thresholds.
| ||||||||||||||||||||||||||||||||||||||||
BamToGvcf | HaplotypeCaller variant calling outputs.
| ||||||||||||||||||||||||||||||||||||||||
BamToCram | Files associated with converting the aggregated recalibrated BAM to CRAM.
|
References
Reference data hosted in GCP may be found here.
Containers
Containers used by the pipeline are hosted in the Broad Institute’s public container registry, and the public biocontainers registry in quay.io.