• April 12, 2026

DNA Sequencing

History of DNA Sequencing

  • The foundation of DNA research was established in 1953 when James Watson and Francis Crick elucidated the double-helix structure of DNA, providing the structural basis for subsequent molecular genetic studies.
  • The first experimental attempt to determine the sequence of a nucleic acid was reported in 1964 by Robert W. Holley, who successfully determined the nucleotide sequence of transfer RNA (tRNA). This work represented the initial methodological framework for nucleic acid sequencing.
  • Building upon this approach, researchers later applied similar strategies to sequence the genome of the bacteriophage MS2. These early studies focused on RNA molecules; therefore, direct DNA sequencing techniques had not yet been developed at that stage.
  • A major breakthrough occurred in 1977 when Frederick Sanger introduced the chain-termination method for DNA sequencing, commonly known as the Sanger sequencing technique. In the same year, Allan Maxam and Walter Gilbert independently developed the chemical cleavage approach, referred to as the Maxam–Gilbert sequencing method. Using this chemical degradation strategy, the genome of bacteriophage  was successfully sequenced.
  • Despite their significance, both methods were labor-intensive and time-consuming due to the absence of automated instrumentation. Progress toward automation began in 1986 when Lloyd M. Smith and collaborators developed a semi-automated DNA sequencing system based on fluorescent labeling. Subsequently, in 1987, Applied Biosystems introduced the first fully automated DNA sequencing instruments, enabling high-throughput and standardized sequencing workflows.
  • Further technological advancement occurred in 1996 with the development of capillary electrophoresis–based DNA sequencing platforms by Applied Biosystems. These systems significantly improved sequencing speed, resolution, and data accuracy. The integration of automated Sanger sequencing technologies played a crucial role in the completion of the Human Genome Project in 2003.
  • The introduction of next-generation sequencing (NGS) technologies in 2005 by Illumina (initially developed through Solexa technology) marked a major transition toward massively parallel sequencing. These platforms enabled rapid, high-throughput, and cost-effective genomic analysis, initiating the modern era of large-scale genomic sequencing.
  • Numerous technological milestones have since contributed to the evolution of DNA sequencing methodologies, leading to increasingly efficient and scalable sequencing platforms.

Definition of DNA Sequencing

DNA sequence analysis is essential for the identification of genetic variations that cannot be detected solely through allelic variation studies or amplification-based techniques such as the Polymerase Chain Reaction. Although PCR enables the amplification of specific DNA regions, it does not allow the discovery of unknown nucleotide polymorphisms within a sequence. To overcome this limitation, DNA sequencing methodologies were developed to determine the exact nucleotide composition of DNA molecules.

DNA sequencing is defined as a laboratory technique used to determine the precise order of nucleotides within a DNA molecule through a series of controlled biochemical reactions.

The sequencing workflow involves two major components: experimental laboratory procedures and computational data analysis. Initially, DNA fragments undergo chemical or enzymatic reactions that generate sequence-dependent signals. During this process, the incorporation or termination of individual nucleotides produces detectable signals corresponding to each base position.

These signals are captured by automated sequencing instruments and subsequently transferred to computational systems for analysis. The sequencing platform records nucleotide incorporation events as digital signals, which are interpreted by bioinformatics software to reconstruct the nucleotide sequence.

In most sequencing platforms, nucleotides are labeled with either radioactive or fluorescent markers, enabling the detection and identification of each incorporated base during the sequencing reaction. This signal-based detection forms the fundamental principle underlying modern DNA sequencing technologies.

DNA Sequencing Methods

DNA sequencing technologies have undergone substantial technological evolution over the past several decades, enabling major advances in molecular biology, genomics, and clinical diagnostics. Based on sequencing strategy, throughput capacity, and technological principles, DNA sequencing platforms are generally categorized into three technological generations.

  • First-Generation Sequencing

The first generation of DNA sequencing technologies emerged in the late 1970s. The two principal methods developed during this period were the chemical cleavage method introduced by Allan Maxam and Walter Gilbert, and the chain-termination method developed by Frederick Sanger.

The Maxam–Gilbert method enabled the sequencing of DNA fragments of approximately 400 nucleotides, whereas the Sanger sequencing technique could determine sequences of up to 1,000 nucleotides per reaction. In 1987, Applied Biosystems commercialized automated Sanger sequencing platforms, significantly improving sequencing efficiency and reproducibility.

Sanger sequencing remained the dominant technology for large-scale genomic projects and was instrumental in the completion of the Human Genome Project, which produced the first complete sequence of the human genome in 2003.

  • Second-Generation Sequencing (Next-Generation Sequencing)

While first-generation sequencing technologies analyze a single DNA fragment per reaction, second-generation sequencing technologies commercialized in the early 2000s introduced massively parallel sequencing capabilities. These technologies are commonly referred to as next-generation sequencing (NGS) or massively parallel sequencing, as they allow the simultaneous sequencing of millions to billions of DNA fragments.

NGS platforms typically generate short sequencing reads, generally ranging from 50 to 500 nucleotides in length. The sequences obtained from individual fragments, known as reads, are computationally processed to reconstruct larger genomic regions. Bioinformatic algorithms either assemble overlapping reads de novo or align them to a reference genome for comparative genomic analysis.

  • Assembly of DNA Sequences

Both first-generation and second-generation sequencing technologies are commonly classified as short-read sequencing approaches, since the generated read lengths are relatively limited typically less than 1,000 bases for Sanger sequencing and less than 500 bases for most NGS platforms.

In contrast, third-generation sequencing technologies, developed in the early 2010s, introduced long-read sequencing capabilities. These platforms generate reads that frequently exceed 10,000 nucleotides (10 kilobases) in length, enabling improved characterization of complex genomic regions, including repetitive sequences and structural variations.

To optimize sequencing accuracy and genome assembly, researchers often employ hybrid sequencing strategies that combine multiple technologies. For example, the high accuracy of Sanger sequencing can be used to validate specific sequence variants identified through NGS. Similarly, integrating NGS with long-read sequencing technologies improves genome assembly quality, particularly in genomic regions containing extensive repetitive elements or large structural rearrangements.

Sanger Sequencing

Definition

Frederick Sanger developed the Sanger sequencing method, also referred to as chain-termination sequencing, in 1977. This technique determines the nucleotide sequence of DNA through in vitro DNA synthesis in the presence of chain-terminating nucleotides, followed by fragment separation using electrophoresis. The incorporation of dideoxynucleotides interrupts DNA strand elongation, generating fragments of varying lengths that enable sequence determination.

Principle and Mechanism of Sanger Sequencing

Sanger sequencing requires several essential components: a single-stranded DNA template, a specific DNA primer, DNA polymerase, standard deoxynucleotide triphosphates (dNTPs), and chain-terminating dideoxynucleotide triphosphates (ddNTPs). The reaction produces multiple DNA fragments derived from the same template sequence.

1. Denaturation and Primer Annealing

The sequencing process begins with denaturation of double-stranded DNA, producing single-stranded templates. Subsequently, a short oligonucleotide primer hybridizes to a complementary region of the template DNA. This primer provides the 3′ hydroxyl group required for DNA polymerase-mediated strand extension.

2. DNA Strand Extension and Chain Termination

DNA polymerase catalyzes strand elongation by incorporating nucleotides from a mixture of dNTPs and ddNTPs. When a dNTP is incorporated, DNA synthesis proceeds normally. In contrast, incorporation of a ddNTP results in termination of chain elongation because ddNTPs lack the 3′-OH group necessary for phosphodiester bond formation.

Since ddNTP incorporation occurs randomly during the reaction, the process generates a population of DNA fragments of different lengths, each terminating at a specific nucleotide position. Each ddNTP is labeled with a distinct fluorescent dye, allowing nucleotide identification during detection.

3. Fragment Separation

The resulting DNA fragments are separated according to size using capillary electrophoresis. During electrophoretic migration, shorter DNA fragments move faster than longer fragments, enabling their sequential separation. This process organizes fragments in ascending order of length as they migrate toward the detector.

4. Detection and Sequence Determination

As the DNA fragments pass through the detection system, a laser excites the fluorescent dyes attached to the terminating ddNTPs. The emitted fluorescence is captured by an optical detector, and the signal is recorded electronically.

Each fluorescent dye corresponds to a specific nucleotide base. The recorded signals are displayed as a chromatogram, in which peaks represent individual nucleotides. Because fragments are detected sequentially based on their length, the chromatogram directly reflects the nucleotide order along the DNA template, enabling accurate reconstruction of the DNA sequence.

Applications of Sanger Sequencing

Sanger sequencing is most suitable for the analysis of short DNA fragments, typically less than 1 kilobase (kb) in length. It is primarily applied when a limited number of genomic targets or samples must be sequenced. Due to the relatively low throughput and higher per-sample cost compared with high-throughput technologies, this method is not optimal for large-scale genomic studies.

Despite these limitations, Sanger sequencing remains one of the most reliable and accurate DNA sequencing techniques. For this reason, it is widely regarded as the reference method (gold standard) in many molecular biology and clinical genetics applications. It is frequently used to validate sequence variants identified by next-generation sequencing (NGS) or long-read sequencing platforms, ensuring the accuracy of detected mutations or polymorphisms.

Advantages and Limitations of Sanger Sequencing

Advantages

  • High sequencing accuracy, making it suitable for confirmatory analyses.

  • Simpler data processing and interpretation compared with high-throughput sequencing technologies.

  • Cost-effective for small sample sets or targeted sequencing projects.

  • Relatively long read length (up to approximately 1 kb), which can facilitate sequence assembly and improve the resolution of repetitive genomic regions.

Limitations

  • Requires a relatively large quantity of input DNA compared with some modern sequencing methods.

  • Low throughput, as each reaction typically analyzes a single DNA fragment.

  • Not cost-effective for large-scale sequencing projects involving numerous samples or extensive genomic regions.

Analysis of Sanger Sequencing Data

Several bioinformatics tools are available for the interpretation of Sanger sequencing results. For example, Geneious Prime provides integrated functionalities for the visualization and analysis of sequencing chromatograms. These tools support:

  • Automated assembly of forward and reverse sequencing reads

  • Identification of single nucleotide polymorphisms (SNPs)

  • Detection of sequence heterogeneity or mixed templates within samples

  • Alignment of chromatogram-derived sequences with reference genomes

Such computational platforms facilitate accurate interpretation of fluorescence chromatograms and improve the reliability of sequence variant identification.

Next-Generation Sequencing (NGS)

Definition

Next-generation sequencing (NGS) is a high-throughput DNA sequencing technology that enables the simultaneous analysis of millions to billions of DNA fragments within a single experimental run. Unlike Sanger sequencing, which analyzes a single DNA fragment per reaction, NGS produces large numbers of short sequencing reads, typically ranging from 50 to 500 base pairs (bp). These reads are computationally reconstructed through DNA sequence assembly to generate larger genomic sequences.

Principle and Workflow of Next-Generation Sequencing

The NGS workflow generally consists of four major stages: nucleic acid extraction, library preparation, sequencing, and bioinformatic analysis. Although specific procedures vary between sequencing platforms, the fundamental principles remain similar.

1. Nucleic Acid Extraction

The sequencing process begins with the isolation of nucleic acids (DNA or RNA) from the biological sample of interest. Extraction is typically performed using specialized commercial kits optimized for different biological materials such as bacterial cultures, tissue samples, or blood.

If the initial nucleic acid is RNA, it must first be converted into complementary DNA (cDNA) through reverse transcription using the enzyme reverse transcriptase before proceeding to library preparation.

2. Library Preparation

Library preparation involves generating a collection of DNA fragments with specific adapter sequences attached to both ends. These adapters are essential for fragment immobilization, amplification, and sequencing.

The process generally includes the following steps:

  • DNA fragmentation: Genomic DNA is fragmented into smaller pieces, typically less than 1 kilobase (kb) in length. Alternatively, specific genomic regions may be amplified using Polymerase Chain Reaction when the target sequence is known.

  • Adapter ligation: Synthetic adapter sequences are ligated to both ends of each DNA fragment. These adapters allow fragments to bind to the sequencing surface and provide binding sites for sequencing primers.

  • Sample barcoding: When multiple samples are sequenced simultaneously, unique DNA barcode sequences are incorporated into each sample library, enabling identification of reads derived from individual samples.

  • Library amplification: In some cases, the prepared library is amplified to generate sufficient DNA quantities for sequencing.

3. Sequencing

Following library preparation, the DNA fragments are denatured into single strands and immobilized on a sequencing surface known as a flow cell. The flow cell contains billions of binding sites that enable simultaneous sequencing of numerous DNA fragments.

After attachment, fragments undergo cluster amplification, typically through repeated PCR cycles, generating thousands of identical copies of each original DNA fragment. These clusters produce detectable fluorescent signals during sequencing.

Many platforms, including those developed by Illumina, employ sequencing-by-synthesis technology. This approach involves:

  • A sequencing primer complementary to the adapter sequence

  • DNA polymerase for nucleotide incorporation

  • Fluorescently labeled, reversible chain-terminating nucleotides

Each nucleotide type carries a distinct fluorescent label. Because the nucleotides temporarily block extension, DNA polymerase incorporates one nucleotide at a time. After each incorporation, fluorescence is detected to identify the added base. The blocking group and dye are subsequently removed, allowing the next nucleotide to be incorporated.

Sequencing can be performed in two modes:

  • Single-end sequencing, where DNA is sequenced from one end of the fragment

  • Paired-end sequencing, where both ends of the fragment are sequenced, improving genome assembly accuracy and alignment reliability

4. Sequence Analysis

Following sequencing, the generated raw data undergo computational processing and bioinformatic analysis. Initial preprocessing steps include quality filtering, read trimming, and pairing of reads.

The processed reads can then be analyzed through:

  • De novo sequence assembly, where overlapping reads are merged to reconstruct genomic sequences

  • Reference-based alignment, where reads are mapped to a known reference genome

Several bioinformatics platforms, including Geneious Prime, facilitate the visualization, processing, and interpretation of sequencing data, enabling accurate identification of genetic variants, mutations, and genomic structural features.

Long-Read Sequencing

Definition

Long-read sequencing refers to advanced DNA sequencing technologies capable of generating extended sequence reads, typically ranging from 10 kilobases (kb) to more than 50 kb per read. Unlike short-read sequencing approaches, these methods directly analyze individual DNA molecules without prior amplification.

Because of the extended read length, long-read sequencing significantly simplifies genome assembly, as fewer sequence fragments must be computationally reconstructed. However, these technologies generally exhibit higher base-calling error rates compared with short-read next-generation sequencing platforms.

Principle and Major Long-Read Sequencing Technologies

Long-read sequencing technologies were first introduced in the late 2000s, with several biotechnology companies developing different methodological approaches. Two of the most widely used platforms are single-molecule real-time sequencing and nanopore sequencing.

Single-Molecule Real-Time (SMRT) Sequencing

SMRT sequencing, developed by Pacific Biosciences, involves the preparation of circular DNA templates by ligating adapter sequences to both ends of the DNA fragment. This configuration allows continuous sequencing of the same molecule.

The sequencing reaction occurs within microscopic observation chambers called zero-mode waveguides (ZMWs). Each ZMW contains a single DNA polymerase molecule associated with a single DNA template.

During sequencing, DNA polymerase incorporates fluorescently labeled nucleotides into the growing DNA strand. Each nucleotide carries a distinct fluorescent label. As nucleotide incorporation occurs, fluorescence signals are detected in real time, enabling the identification of individual bases as the polymerization process proceeds.

Nanopore Sequencing

Nanopore sequencing, developed by Oxford Nanopore Technologies, is based on the measurement of changes in ionic current as nucleic acid molecules pass through a nanopore embedded in a membrane.

During sequencing, a motor protein unwinds the double-stranded DNA molecule and guides a single DNA strand through the nanopore. Each nucleotide passing through the pore produces a distinct disruption in the ionic current, due to differences in molecular size and electrical properties.

These electrical signal variations are recorded and computationally translated into the corresponding nucleotide sequence.

Applications of Long-Read Sequencing

Long-read sequencing technologies are particularly useful in situations where high-quality reference genomes are unavailable. The extended read lengths enable improved analysis of complex genomic features, including:

  • Large structural variations

  • Insertions and deletions (indels)

  • Highly repetitive genomic regions

Additionally, because these methods often analyze native DNA molecules without amplification, they facilitate the detection of epigenetic modifications, such as DNA methylation.

Advantages and Limitations of Long-Read Sequencing

Advantages

  • Simplified genome assembly, due to longer sequence reads

  • Reduced ambiguity when reconstructing repetitive genomic regions

  • Simpler library preparation procedures compared with some short-read platforms

  • Portable sequencing devices, with some systems comparable in size to a USB device

  • Rapid sequencing runs, enabling faster data acquisition

Limitations

  • Higher base-calling error rates compared with next-generation sequencing platforms

Bioinformatic Analysis of Long-Read Sequencing Data

Bioinformatics software platforms such as Geneious Prime support the assembly and analysis of long-read sequencing data. These tools allow researchers to process sequence data generated by platforms such as those developed by Pacific Biosciences and Oxford Nanopore Technologies.

Furthermore, hybrid assembly strategies can be implemented by integrating short-read sequencing data ( Illumina platforms) with long-read datasets to enhance genome assembly accuracy and structural variant detection.