Align reads to the reference genome, call variants, and calculate quality metrics using Sentieon.
This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. This workflow performs read alignment, duplicate marking, base quality score recalibration (BQSR), and variant calling steps. The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution. The workflow also computes quality metrics on the deduplicated alignments and produces various plots which can be used to quickly inspect sample quality.
The workflow can optionally output a gVCF rather than a VCF file, which can be combined with other sample gVCFs for use in joint genotyping.
This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here.
Workflow Inputs
The workflow can be run using either paired FASTQ or aligned BAM/CRAM files. If using the FASTQ entrypoint, r1_fastq
, r2_fastq
, and read_groups
must be defined. If using the BAM/CRAM entrypoint, input_aln
and input_aln_idx
must be defined.
Input | Description |
---|---|
r1_fastq | R1 fastq files |
r2_fastq | R2 fastq files |
read_groups | Sample read groups |
input_aln | Input alignment (BAM/CRAM) files |
input_aln_idx | Input alignment (BAM/CRAM) index files |
reference_name | The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’) |
run_dedup_and_qc | If |
output_gvcf | Output variant calls in the gVCF format instead of VCF [false] |
dnascope_model | The Sentieon DNAscope variant calling model |
canonical_user_id | Your account’s AWS canonical user ID. Used to acquire a Sentieon license |
sentieon_docker | Sentieon docker image |
is_pcr_free | Set to |
n_threads | Number of vCPUs to allocate for the task [32] |
memory | Memory to allocate for the task [64 GiB] |
Workflow Outputs
The workflow produces variant calles in either VCF or gVCF format will be produced. Other outputs will depend on the options selected.
Output | Description | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
calls_vcf | Variant calls in VCF or gVCF format | ||||||||||||||||||||||||||
calls_vcf_tbi | Variant calls index | ||||||||||||||||||||||||||
Metrics and reads files | Metrics and reads files are produced if
| ||||||||||||||||||||||||||
bqsr_table | Sample recal table output by running BQSR. Base quality score recalibration will run if no custom |
Containers
The latest version of the Sentieon Docker image can be run by following the instructions listed here.