Trio Calling for de novo MutationsNext-generation sequencing of family pedigrees offers a powerful approach for the identification of transmitted alleles and/or de novo mutations that may confer susceptibility to disease. The "trio" subcommand of VarScan leverages the family relationship to improve variant calling accuracy, identify apparent Mendelian Inheritance Errors (MIEs), and detect high-confidence de novo mutations.
Based on recent WGS studies in families, we think that the de novo mutation rate in humans is approximately 1.1 × 10-8 per haploid genome (1000 Genomes Project Consortium, 2010; Roach et al., 2010). By this estimate, an individual's diploid genome harbors, on average, around 64 de novo mutations among 3.2 billion base pairs. In the consensus coding sequence (CCDS) (~34 mbp), we expect less than one de novo coding mutation per diploid individual.
Because of this extreme rareness, de novo mutations should be called conservatively. To address this, VarScan re-evaluates apparent de novo mutations in each parent using relaxed parameters and re-classifies those with some evidence in one or both parents as a germline variant. In a similar manner, VarScan attempts to reconcile apparent Mendelian Inheritance Errors. The output of the trio subcommand is a single VCF in which all variants are classified as germline (transmitted or untransmitted), de novo, or MIE.
InputThis command requires "mpileup" for the father, mother, and child (in that order). Generating it will require:
- The SAMtools package
- The reference sequence in FASTA format
- BAM files for the father, mother, and child
Trio Calling SyntaxTrio calling with VarScan 2 is a two-step process.
1. Generate a three-sample mpileupHere's an example command:
samtools mpileup -B -q 1 -f ref.fasta dad.bam mom.bam child.bam >trio.mpileup
2. Run VarScan trioHere's the syntax for the VarScan subcommand:
java -jar VarScan.jar trio trio.mpileup trio.mpileup.output \ --min-coverage 10 --min-var-freq 0.20 --p-value 0.05 \ -adj-var-freq 0.05 -adj-p-value 0.15
Trio Calling AlgorithmVarScan first calls variants across all three samples the same fashion as it does for mpileup2snp using the default (or user-provided) --min-var-freq and --p-value settings. Next, it identifies any variants with apparent Mendelian Inheritance Errors (i.e. present in child but absent from either parent). In these instances, it re-calls the samples that should have a variant but were called wild type with adjusted settings (--adj-var-freq and --adj-p-value), in an attempt to correct the call. This often corrects the MIE, in which case the corrected genotypes are reported. If not, the site will be flagged as mendelError (in the FILTER field) and/or DENOVO (in the INFO field).
OutputThe above command will produce two VCF output files: one for SNPs (trio.mpileup.output.snp.vcf) and one for indels (trio.mpileup.output.indel.vcf). Relevant INFO fields include:
- FILTER - mendelError if MIE, otherwise PASS
- STATUS - 1=untransmitted, 2=transmitted, 3=denovo, 4=MIE
- DENOVO - if present, indicates a high-confidence de novo mutation call