Compare tumor/normal samples and call somatic variants using Sentieon.
This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. For both tumor and normal samples the workflow performs read alignment, deduplication and BQSR. The workflow then calls somatic variants on each sample, identifying potential sites where the cancer genome data displays somatic variations relative to the normal genome, and calculates genotypes at those sites. Finally, the variants are filtered.
The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution.
This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here. TNseq®-specific documentation may be found here.
Workflow Inputs
Both tumor and optionally normal sample reads are input. Each sample’s inputs will be separately aligned and corrected, before being combined during somatic variant calling and filtering
Input | Description |
---|---|
r1_fastq | R1 fastq files for the tumor sample |
r2_fastq | R2 fastq files for the tumor sample |
read_groups | Read groups for the tumor sample |
normal_r1_fastq | R1 fastq files for the normal sample |
normal_r2_fastq | R2 fastq files for the normal sample |
normal_read_groups | Read groups for the normal sample |
reference_name | The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’) |
pon_vcf | The panel of normals VCF file |
pon_vcf_tbi | The panel of normals VCF index file |
germline_vcf | The germline VCF file |
germline_vcf_tbi | The germline VCF index file |
contamination_vcf | The VCF file of germline sites for contamination detection |
contamination_vcf_tbi | The VCF index file of germline sites for contamination detection |
canonical_user_id | Your account’s AWS canonical user ID. Used to acquire a Sentieon license |
sentieon_docker | Sentieon docker image |
n_threads | Number of vCPUs to allocate for the task [32] |
memory | Memory to allocate for the task [64 GiB] |
Workflow Outputs
The main output of the pipeline is a VCF containing somatic variant calls. Various metrics and plots are also produced.
Output | Description | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
calls_vcf | Somatic variant calls output by TNfilter | ||||||||||||||||||||||||||||
calls_vcf_tbi | Index for the | ||||||||||||||||||||||||||||
Metrics and reads files | Each output listed here will be produced for both the tumor sample and the normal sample, if a normal sample is provided. Normal sample outputs are produced if any normal sample fastqs are provided (i.e. if
|
Containers
The latest version of the Sentieon Docker image can be run by following the instructions listed here.