DRAGEN functional equivalence discovery and filtering of SNPs and indels.
This workflow uses the GATK HaplotypeCaller for SNP and indel discovery according to the GATK Best Practices. When the workflow runs in DRAGEN mode, it produces a Dragstr model that is used during variant calling, and it performs hard filtering.
This workflow is maintained by the Broad Institute and is written in Workflow Description Language (WDL). Further documentation can be found here.
Workflow Inputs
The workflow requires sample and reference information. The user may decide whether or not to run the pipeline in DRAGEN functional equivalence mode by setting the value of the run_dragen_mode_variant_calling
input.
The Broad Institute provides various test inputs hosted in GCP that can be used to run the pipeline.
Input | Description | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
calling_interval_list | Interval list used for variant calling | ||||||||||||||||||
evaluation_interval_list | File containing the target set of genomic intervals | ||||||||||||||||||
haplotype_scatter_count | Scatter count used for variant calling | ||||||||||||||||||
break_bands_at_multiples_of | Breaks reference bands up at genomic positions that are multiples of this number; used to reduce (g)VCF file size | ||||||||||||||||||
input_bam | Input BAM | ||||||||||||||||||
input_bam_index | Input BAM index | ||||||||||||||||||
ref_fasta | Reference fasta | ||||||||||||||||||
ref_fasta_index | Reference fasta index | ||||||||||||||||||
ref_dict | Reference dictionary | ||||||||||||||||||
dbsnp_vcf | dbSNP VCF file | ||||||||||||||||||
dbsnp_vcf_index | dbSNP VCF file index | ||||||||||||||||||
base_file_name | String used for output files; can be set to a read group ID | ||||||||||||||||||
final_vcf_base_name | Base name for the output (g)VCF file; can be set to a read group ID | ||||||||||||||||||
agg_preemptible_tries | Number of preemtible machine tries | ||||||||||||||||||
Optional inputs | Set
|
Workflow Outputs
The key outputs of the workflow are variant calls in either VCF or gVCF format. Various metrics and associated files are also produced.
Output | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CollectVariantCallingMetrics | Outputs from calling variants with HaplotypeCaller.
| ||||||||||||
MergeBamOuts | Output from sorting and merging the BAM files, then correcting the merged BAM file.
|
References
Reference data hosted in GCP may be found here.
Containers
Containers used by the pipeline are hosted in the Broad Institute’s public container registry, and the public biocontainers registry in quay.io.