Call small and structural variants using Oxford Nanopore data.
Workflow for calling small and structural variants using Oxford Nanopore long-reads. The workflow merges alignments, computes alignment metrics, and calls variants. Additionally, the workflow can optionally call small variants within the mitochondrial genome.
This workflow is maintained by the Broad Institute and is written in Workflow Description Language (WDL). Further documentation can be found here.
Workflow Inputs
The workflow is run once per sample. Each sample may have multiple associated aligned BAMs.
Input | Description |
---|---|
aligned_bams | Path to aligned BAM files |
aligned_bais | Path to aligned BAM file indices |
participant_name | Name of the participant from whom these samples were obtained |
ref_map_file | Table indicating reference sequence and auxillary file locations |
gcs_out_root_dir | GCS bucket to store the output reads, variants, and metrics files |
bams_suspected_to_contain_dup_record | Boolean paramter to indicate if the BAM files provided are suspected to contain duplicate records |
ref_scatter_interval_list_locator | A file holding paths to interval_list files; needed only when running DV-Pepper |
ref_scatter_interval_list_ids | A file that gives short IDs to the interval_list files; needed only when running DV-Pepper |
Workflow Outputs
The workflow outputs a set of alignment stats and variant calls output by various small and structural variant calling tools.
Output | Description |
---|---|
merged_bam | Merged BAM file, comprised of the set of input `aligned_bams` |
merged_bai | Merged BAM file index |
aligned_num_reads | Number of aligned reads |
aligned_num_bases | Number of aligned bases |
aligned_frac_bases | Fractional number of aligned bases |
aligned_est_fold_cov | Estimated aligned coverage |
aligned_read_length_mean | Mean aligned read length |
aligned_read_length_median | Median aligned read length |
aligned_read_length_stdev | Aligned read length standard deviation |
aligned_read_length_N50 | Aligned read length N50 value |
average_identity | Average identity value obtained from Nanoplot |
median_identity | Median identity value obtained from Nanoplot |
pbsv_vcf, pbsv_tbi | VCF file and index output by the PacBio Structural Variant (PBSV) |
sniffles_vcf, sniffles_tbi | VCF file and index output by the Sniffles structural variant caller |
clair_vcf, clair_tbi | VCF file and index output by the Clair deep neural network based variant caller |
clair_gvcf, clair_gtbi | gVCF file and index output by Clair |
dvp_vcf, dvp_tbi | VCF file and index output by DeepVariant Pepper (DVPepper) |
dvp_g_vcf, dvp_g_tbi | gVCF file and index output by DVPepper |
dvp_phased_vcf, dvp_phased_tbi | Phased VCF file and index output by DVPepper |
Containers
Containers used by the pipeline are hosted in the Broad Institute’s public container registry, and the public biocontainers registry in quay.io.