DNAseq is the general process of sequencing DNA and can be used for a variety of purposes. One of the most common purposes is detection of genome variants. By using enrichment methods (exome, amplicon) an investigator can target regions of interest in the genome and save considerable sequencing cost. However, some questions dictate the use of whole genome sequencing (WGS) or whole genome variant detection which entails sampling genomic DNA fragments from the entire genome to get a reasonable representation of its complete contents. Sequencing depth varies drastically depending on the project goals, organism and policy. For more details on sequencing depth see the guidelines tab.
The workflow of most DNA sequencing analysis methods start similarly and diverge at the variant calling step.
- Quality Control:
- We view the GC content and insert size of each library to determine if any are highly variable or unexpectedly different.
- Read Trimming:
- Removal of sequencing adapters that are unintentionally sequenced due to a reduced insert size is an essential step in accurate variant calling. We also trim the 3′ end of the read until a Phred score average of 20 occurs.
- We align all trimmed reads via the highly accurate BWA-MEM Smith-Waterman Local Alignment algorithm. This ensures only high quality local alignments are used for variant calling.
- Variant Calling:
- GATK HaplotypeCaller uses the locally aligned reads to make variant calls against the genome of interest. This algorithm uses a local de-novo assembly using a De Bruijn-like graph to reassemble the active region and identify possible haplotypes. The program then realigns each haplotype against the reference using the Smith-Waterman algorithm. HaplotypeCaller then uses the read data to build a likelihood matrix for each potential variant site using the PairHMM algorithm. For each potentially variant site, the program applies Bayes’ rule, using the likelihoods of alleles to calculate the likelihoods of genotypes per sample given the read data that was observed for the sample. The most likely genotype is then assigned to the sample.
- Variant Processing:
- Small Variant Quality Statistics
- Determination of transition (Ti) to transversion (Tv) ratio for all SNPs (TiTv).
- Variants proportions are determined by type (Indel, Synonymous SNP, Nonsynonymous SNP).
EXAMPLE OUTPUT REPORT:
Click the link to be redirected to our: Example DNAseq report