Next-Generation Sequencing (NGS)
Definition
Next-generation sequencing (NGS) is a high-throughput DNA sequencing technology that enables the simultaneous analysis of millions to billions of DNA fragments within a single experimental run. Unlike Sanger sequencing, which analyzes a single DNA fragment per reaction, NGS produces large numbers of short sequencing reads, typically ranging from 50 to 500 base pairs (bp). These reads are computationally reconstructed through DNA sequence assembly to generate larger genomic sequences.
Principle and Workflow of Next-Generation Sequencing
The NGS workflow generally consists of four major stages: nucleic acid extraction, library preparation, sequencing, and bioinformatic analysis. Although specific procedures vary between sequencing platforms, the fundamental principles remain similar.
1. Nucleic Acid Extraction
The sequencing process begins with the isolation of nucleic acids (DNA or RNA) from the biological sample of interest. Extraction is typically performed using specialized commercial kits optimized for different biological materials such as bacterial cultures, tissue samples, or blood.
If the initial nucleic acid is RNA, it must first be converted into complementary DNA (cDNA) through reverse transcription using the enzyme reverse transcriptase before proceeding to library preparation.
2. Library Preparation
Library preparation involves generating a collection of DNA fragments with specific adapter sequences attached to both ends. These adapters are essential for fragment immobilization, amplification, and sequencing.
The process generally includes the following steps:
-
DNA fragmentation: Genomic DNA is fragmented into smaller pieces, typically less than 1 kilobase (kb) in length. Alternatively, specific genomic regions may be amplified using Polymerase Chain Reaction when the target sequence is known.
-
Adapter ligation: Synthetic adapter sequences are ligated to both ends of each DNA fragment. These adapters allow fragments to bind to the sequencing surface and provide binding sites for sequencing primers.
-
Sample barcoding: When multiple samples are sequenced simultaneously, unique DNA barcode sequences are incorporated into each sample library, enabling identification of reads derived from individual samples.
-
Library amplification: In some cases, the prepared library is amplified to generate sufficient DNA quantities for sequencing.
3. Sequencing
Following library preparation, the DNA fragments are denatured into single strands and immobilized on a sequencing surface known as a flow cell. The flow cell contains billions of binding sites that enable simultaneous sequencing of numerous DNA fragments.
After attachment, fragments undergo cluster amplification, typically through repeated PCR cycles, generating thousands of identical copies of each original DNA fragment. These clusters produce detectable fluorescent signals during sequencing.
Many platforms, including those developed by Illumina, employ sequencing-by-synthesis technology. This approach involves:
-
A sequencing primer complementary to the adapter sequence
-
DNA polymerase for nucleotide incorporation
-
Fluorescently labeled, reversible chain-terminating nucleotides
Each nucleotide type carries a distinct fluorescent label. Because the nucleotides temporarily block extension, DNA polymerase incorporates one nucleotide at a time. After each incorporation, fluorescence is detected to identify the added base. The blocking group and dye are subsequently removed, allowing the next nucleotide to be incorporated.
Sequencing can be performed in two modes:
-
Single-end sequencing, where DNA is sequenced from one end of the fragment
-
Paired-end sequencing, where both ends of the fragment are sequenced, improving genome assembly accuracy and alignment reliability
4. Sequence Analysis
Following sequencing, the generated raw data undergo computational processing and bioinformatic analysis. Initial preprocessing steps include quality filtering, read trimming, and pairing of reads.
The processed reads can then be analyzed through:
-
De novo sequence assembly, where overlapping reads are merged to reconstruct genomic sequences
-
Reference-based alignment, where reads are mapped to a known reference genome
Several bioinformatics platforms, including Geneious Prime, facilitate the visualization, processing, and interpretation of sequencing data, enabling accurate identification of genetic variants, mutations, and genomic structural features.
Long-Read Sequencing
Definition
Long-read sequencing refers to advanced DNA sequencing technologies capable of generating extended sequence reads, typically ranging from 10 kilobases (kb) to more than 50 kb per read. Unlike short-read sequencing approaches, these methods directly analyze individual DNA molecules without prior amplification.
Because of the extended read length, long-read sequencing significantly simplifies genome assembly, as fewer sequence fragments must be computationally reconstructed. However, these technologies generally exhibit higher base-calling error rates compared with short-read next-generation sequencing platforms.
Principle and Major Long-Read Sequencing Technologies
Long-read sequencing technologies were first introduced in the late 2000s, with several biotechnology companies developing different methodological approaches. Two of the most widely used platforms are single-molecule real-time sequencing and nanopore sequencing.
Single-Molecule Real-Time (SMRT) Sequencing
SMRT sequencing, developed by Pacific Biosciences, involves the preparation of circular DNA templates by ligating adapter sequences to both ends of the DNA fragment. This configuration allows continuous sequencing of the same molecule.
The sequencing reaction occurs within microscopic observation chambers called zero-mode waveguides (ZMWs). Each ZMW contains a single DNA polymerase molecule associated with a single DNA template.
During sequencing, DNA polymerase incorporates fluorescently labeled nucleotides into the growing DNA strand. Each nucleotide carries a distinct fluorescent label. As nucleotide incorporation occurs, fluorescence signals are detected in real time, enabling the identification of individual bases as the polymerization process proceeds.
Nanopore Sequencing
Nanopore sequencing, developed by Oxford Nanopore Technologies, is based on the measurement of changes in ionic current as nucleic acid molecules pass through a nanopore embedded in a membrane.
During sequencing, a motor protein unwinds the double-stranded DNA molecule and guides a single DNA strand through the nanopore. Each nucleotide passing through the pore produces a distinct disruption in the ionic current, due to differences in molecular size and electrical properties.
These electrical signal variations are recorded and computationally translated into the corresponding nucleotide sequence.
Applications of Long-Read Sequencing
Long-read sequencing technologies are particularly useful in situations where high-quality reference genomes are unavailable. The extended read lengths enable improved analysis of complex genomic features, including:
-
Large structural variations
-
Insertions and deletions (indels)
-
Highly repetitive genomic regions
Additionally, because these methods often analyze native DNA molecules without amplification, they facilitate the detection of epigenetic modifications, such as DNA methylation.
Advantages and Limitations of Long-Read Sequencing
Advantages
-
Simplified genome assembly, due to longer sequence reads
-
Reduced ambiguity when reconstructing repetitive genomic regions
-
Simpler library preparation procedures compared with some short-read platforms
-
Portable sequencing devices, with some systems comparable in size to a USB device
-
Rapid sequencing runs, enabling faster data acquisition
Limitations
-
Higher base-calling error rates compared with next-generation sequencing platforms
Bioinformatic Analysis of Long-Read Sequencing Data
Bioinformatics software platforms such as Geneious Prime support the assembly and analysis of long-read sequencing data. These tools allow researchers to process sequence data generated by platforms such as those developed by Pacific Biosciences and Oxford Nanopore Technologies.
Furthermore, hybrid assembly strategies can be implemented by integrating short-read sequencing data ( Illumina platforms) with long-read datasets to enhance genome assembly accuracy and structural variant detection.
