DNAseq Analysis

INTRODUCTION:

DNAseq is the general process of sequencing DNA and can be used for a variety of purposes. One of the most common purposes is detection of genome variants. By using enrichment methods (exome, amplicon) an investigator can target regions of interest in the genome and save considerable sequencing cost. However, some questions dictate the use of whole genome sequencing (WGS) or whole genome variant detection which entails sampling genomic DNA fragments from the entire genome to get a reasonable representation of its complete contents. Sequencing depth varies drastically depending on the project goals, organism and policy.  For more details on sequencing depth see the guidelines tab.

ANALYSIS METHODS:

The workflow of most DNA sequencing analysis methods start similarly and diverge at the variant calling step.

  1. Quality Control:
    • We view the GC content and insert size of each library to determine if any are highly variable or unexpectedly different.
  2. Read Trimming:
    • Removal of sequencing adapters that are unintentionally sequenced due to a reduced insert size is an essential step in accurate variant calling.  We also trim the 3′ end of the read until a Phred score average of 20 occurs.
  3. Alignment:
    • We align all trimmed reads via the highly accurate BWA-MEM Smith-Waterman Local Alignment algorithm. This ensures only high quality local alignments are used for variant calling.
  4. Variant Calling:
    • GATK HaplotypeCaller uses the locally aligned reads to make variant calls against the genome of interest. This algorithm uses a local de-novo assembly using a De Bruijn-like graph to reassemble the active region and identify possible haplotypes. The program then realigns each haplotype against the reference using the Smith-Waterman algorithm. HaplotypeCaller then uses the read data to build a likelihood matrix for each potential variant site using the PairHMM algorithm. For each potentially variant site, the program applies Bayes’ rule, using the likelihoods of alleles to calculate the likelihoods of genotypes per sample given the read data that was observed for the sample. The most likely genotype is then assigned to the sample.
  5. Variant Processing:
    • Small Variant Quality Statistics
    • Determination of transition (Ti) to transversion (Tv) ratio for all SNPs (TiTv).
    • Variants proportions are determined by type (Indel, Synonymous SNP, Nonsynonymous SNP).

EXAMPLE OUTPUT REPORT:

Click the link to be redirected to our: Example DNAseq report

The depth of sequencing for DNAseq will depend on the desired sensitivity of detection:

A typical exome is sequenced at 100x genome coverage

  • If you are calling CNVs from an enrichment method you will need to add control samples at least 10 gender-matched controls.

Successful WGS is usually 20-30x genome coverage

  • Low pass WGS at 4-5x can be a good way of finding CNVs if you don’t have control samples

For detection of a very low allele fraction in somatic tissues, you will need to sequence 500-1000x coverage

Free Design Consultation

We offer free consultations as part of the initial experimental design. We want to ensure that you have thought about all the necessary design components before you conduct your experiment. This way BRC has high-quality data when it comes time for us to analyze the data. We offer this service at no charge because it is more cost-effective to catch design errors before we start the analysis.

Data Analysis and General Consultation ($/hr)

We will offer custom analysis or training at our hourly rate.

Grant Support (% effort)

We can provide project-specific analysis beyond our standard pipeline services when we are written into grants. This may be a cheaper option for labs requiring a lot of analysis time as we dedicate a percent of our effort to the project.