Genome Assembly

INTRODUCTION:

De novo genome assembly is likely one of the most complex bioinformatic tasks we offer as a service. While we are promoting this as a pipeline, assembly projects more often take the form of a dialog. We typically start with a first pass attempt which we can bill as a pipeline service for a known fixed cost. From there we will start our discussions about where to go with the project.

There is no such thing as a simple assembly run. Therefore it is important to realize that, while we can often achieve very good results from a first attempt, genomes are often a never ending work in progress. A few of the issues that plague de novo assembly are: large and nested repeats, high heterozygosity leading to bubbles and miss alignments, and structural variations (large and small) between representative samples.

Our group utilizes the latest generation of long read sequencing technology and de novo assembly algorithms to provide you the most contiguous and high quality first pass assembly possible.

SAMPLE PROCESSING:

Potentially the most important step in producing quality long read sequencing data is the initial sample processing. The DNA sequencing center will attempt to isolate high quality high molecular weight (HMW) DNA from your organism and tissue of choice. The process is very different from traditional phenol choloform extraction, which yields quality DNA, but shears it into tiny chunks. Our HWM DNA extraction methods take steps to reduce DNA shearing, meanwhile freeing large pieces of DNA to be incorporated into libraries.

After isolating quality HMW DNA (the most difficult step), our DNA sequencing center prepares libraries for PacBio and ONT and initiates sequencing.

ORGANISMS WE HAVE ASSEMBLED GENOMES FOR:

The BRC has considerable experience assembling high quality genomes for both model and non-model organisms including:

  • Dog (Canis familiaris)
  • Cattle (Bos taurus)
  • Hundreds of prokaryotic genomes

EXAMPLE OUTPUT REPORT:

Click the link to be redirected to our: Assembly report for sample

SEQUENCING AND ANALYSIS GUIDELINES:

Extracting high molecular weight (HMW) DNA can be difficult and a unique challenge for each organism, especially plants and some prokaryotes. You can expect assembly results to be highly correlated with the ability to extract good HMW DNA.

Genome assembly and error correction methodologies change rapidly. Please contact us for additional information.

Types of Reads:

We recommend long read technologies (ONT and PacBio) for the assembly of genome to allow quality assembly of the highly repetitive regions that are in found organisms all across the tree of life. We would like to achieve 60x coverage using data from ONT that is filtered for reads >10kb. We also encourage an additional 30x coverage with sequence reads obtained from PacBio processed with GCpp.

The long read technologies are extremely sensitive to not only the molecular weight of the DNA but also any impurities in the sample. Contaminants in the sample can severely diminish the sequencing capacity of the instruments. Please consult with the DNA Sequencing facility to work out the details of generating data for your assembly project.

Eukaryotic Genome Assembly  Considerations:

For a sizable eukaryotic genome you can expect to need >150 million reads using Oxford Nanopore technologies data.  Error correction using PacBio reads may require an additional >150 million reads. We can’t state enough that the requirements for your unique organism will vary significantly depending on the quality of data provided.

Prokaryotic Genome Assembly Considerations:

We can often batch Prokaryotic assemblies into batches with up to 16 samples at a time. Batching samples will save on staff time. In our experience, there are certain Prokaryotes which seem to express contaminants that will impede the the HMW extractions and/or sequencing instruments. While we have been able to full assemble several class II/III genomes, you should expect some failures when dealing with these challenging species.

Free Design Consultation

We offer free consultations as part of the initial experimental design. We want to ensure that you have thought about all the necessary design components before you conduct your experiment. This way BRC has high quality data when it comes time for us to analyze the data. We offer this service at no charge because it is more cost effective to catch design errors before we start the analysis.

Data Analysis and General Consultation ($/hr)

We will offer custom analysis or training at our hourly rate.

Grant Support (% effort)

We can provide project specific analysis beyond our standard pipeline services when we are written into grants. This may be a cheaper option for labs requiring a lot of analysis time as we dedicate a percent of our effort to the project.