Common Workflow Language (CWL)-based software pipeline for de novo genome assembly from long- and short-read data

Pasi Korhonen

For reliable gene prediction and post-genomic analyses, reference quality genome assemblies are essential. Here, we created an automated pipeline for the de novo-assembly of genomes from PacBio long-read and Illumina short-read data using common workflow language (CWL). This pipeline integrates and enables an automated installation and execution of a host of software tools, overcomes the challenges of achieving repeatability and reproducibility of assembly results, and offers a platform for the re-use of the workflow and the integration of diverse data sets. We achieved assemblies that meet the high standards set by the National Human Genome Research Institute (NHGRI), to underpin accurate gene predictions and expanded genomic analyses.

🖥 There is a PDF file with the slides for this talk available. Unless otherwise noted in the slides themselves, they are published under a CC BY 3.0 license.
CC BY 3.0 badge