Why indexing matters
Indexing (barcoding) allows many libraries to be pooled and sequenced together, then accurately separated (demultiplexed). Correct assignment is essential to avoid cross-talk (reads leaking into the wrong sample), which can distort allele fractions, spurious low-frequency variant calls, or microbial abundance estimates. Good primers/adapters, thoughtful multiplex design, and rigorous cleanup collectively keep misassignment low. See concise primers on indexing and demultiplexing from core facilities at ECU and UC Davis.
Single vs dual indexing (combinatorial vs unique dual)
Single indexing encodes one index (usually i7). It offers simplicity but provides no way to detect/flag swapped indexes.
Dual indexing (DI) encodes both i7 and i5. Two major flavors are:
-
Combinatorial dual indexes (CDI): reuse a small set of i5 and i7 sequences to create many pairs (e.g., 8×12 = 96). Efficient, but if one index hops, the pair can still form another valid combination in the pool, allowing silent misassignment. Helpful over single indexing but not bullet-proof, especially on patterned flow cells. See practical notes in the UMass Chan Deep Sequencing guides and adapter overviews from U. Florida ICBR.
-
Unique dual indexes (UDI/UD): each sample has a unique i7+i5 pair never reused in that pool. If an index hops, the pair becomes “illegal,” so informatic filtering can discard it. UDIs are widely recommended to suppress index hopping on ExAmp/patterned flow-cell instruments (e.g., NextSeq 2000, NovaSeq 6000/X). See policies and guidance from UC Davis DNA Tech Core, Cornell BRC Genomics, KUMC Genome Sequencing Facility, URMC Rochester Genomics Center, and MIT BioMicro Center.
Take-home: For sensitive applications or patterned flow cells, prefer UDI. CDI can be acceptable for less sensitive uses or on legacy non-patterned systems, but evaluate risk carefully (UMass guide).
Mechanisms of index hopping and cross-talk
Misassignment arises from multiple mechanisms; the dominant one on modern Illumina platforms is index hopping (index switching) during Exclusion Amplification (ExAmp) cluster generation on patterned flow cells when free indexed oligos/adapters remain in the pool. Clear overviews with mitigation tips are available from MSU RTSF and URMC instrumentation pages.
Empirical studies document platform- and library-dependent hopping rates and consequences:
-
Ancient DNA and low-input contexts are especially vulnerable (van der Valk 2020).
-
Non-redundant UDIs enable complete filtering of swapped pairs (Costello 2018).
-
UDI+UMI designs detect and quantify hopping/contamination (MacConaill 2018).
-
Newer work shows simple ways to estimate and correct hopping in scRNA-seq (Miao 2024).
-
FDA evaluations also note hopping with single-index schemes, mitigated by dual indexes (FDA technical assessment).
Adapter and primer design principles that reduce misassignment
-
Use UDIs with sufficient edit distance. Large Hamming distance between indexes (and from read sequences) prevents bleed-through from substitutions and phasing. See UDI rationale and pooling guidance in the UMass adapter pooling PDF.
-
Avoid low-complexity indexes and sequence contexts. Balanced base composition in index reads stabilizes color balance and reduces bleeding/cross-talk in optical base-calling; most academic cores curate balanced index plates (e.g., KUMC “Best Practices”).
-
Incorporate UMIs where appropriate. UMIs do not directly stop hopping, but they help distinguish true duplicates and verify molecule-level assignment. See UDI-UMI adapter validation (MacConaill 2018).
-
Rigorous removal of free primers/adapters. Add extra bead or column cleanups, and avoid over-cycling. Core guidance stresses an extra post-PCR cleanup to strip free oligos (KUMC; UC Davis amplicon note).
-
Validate custom index sets computationally. Tools like GIL generate index sets with constraints on edit distance and diversity (GIL software).
Practical multiplexing on high-throughput runs
-
Prefer UDI on patterned flow cells. Required or strongly recommended by many cores for shared NovaSeq lanes (URMC policy; Cornell BRC shared lane policy).
-
Balance library contributions. Extreme imbalances increase relative contamination of low-depth samples (van der Valk 2020).
-
Diversify base composition per cycle. Plan pools to avoid low-diversity early cycles and index reads (many core SOPs include mixing guidance; see UMass guide).
-
QC before pooling. Size profiles and molarity checks reduce adapter-dimer carryover; multiple cleanups are often specified (e.g., KUMC “Guidelines”).
-
Demultiplexing rules: Configure software to invalidate any i5/i7 pair not present in your UDI table; this automatically discards hopped reads (see worked examples in UMass guide and Southard-Smith 2020).
PCR-based vs PCR-free library preparation: indexing consequences
| Aspect | PCR-based library prep | PCR-free library prep |
|---|---|---|
| Indexing method | Indexes often added by PCR with barcoded primers (2nd PCR) | Indexes added by ligation (no amplification) |
| Risk factors | Residual indexed primers increase hopping during ExAmp; over-cycling increases adapter dimers | Fewer free primers; still must remove adapter dimers |
| Pros | Requires less input; can target/amplicon workflows | Best for minimizing amplification bias and certain cross-talk modes |
| Cons | Higher chance of free indexed primers without rigorous cleanup | Requires higher input and careful size selection |
| Guidance | Add extra cleanup(s), minimize cycles, favor UDIs, balance pools | Still favor UDIs; maintain stringent cleanup |
Mechanistic notes and empirical comparisons are summarized by multiple academic cores (e.g., MSU RTSF note) and independent evaluations (e.g., FDA methods comparison).
Demultiplexing and QC: what to monitor
-
Illegal UDI pairs and hopped-pair counts: With UDI, any i7/i5 combination not in your sample sheet flags misassignment and is discarded (UMass).
-
Per-sample index purity and cross-talk matrices: UDI+UMI designs allow quantifying leakage (MacConaill 2018).
-
Sample balance and depth: Under-represented samples receive a higher proportion of misassigned reads in mixed pools (van der Valk 2020).
-
Platform-specific diagnostics: Some SOPs provide hopping-rate estimation or correction (e.g., scRNA-seq correction in Miao 2024).
-
Reference materials: Use NIST GIAB samples for process benchmarking (GIAB overview).
Selecting the right kit configuration for large studies
When to choose single indexing
-
Legacy instruments or small pilot runs with limited multiplexing and high tolerance for minimal cross-talk.
-
Amplicon projects where costs and simplicity outweigh the small but non-zero misassignment risk (still consider i7-only per UC Davis amplicon FAQ).
When to choose combinatorial dual indexing (CDI)
-
Moderate multiplexing with cost constraints and lower sensitivity requirements.
-
Ensure tight cleanup and adequate edit distance; monitor cross-talk and consider excluding borderline pairs (UMass).
When to choose unique dual indexing (UDI) — the default for big projects
-
High-throughput patterned flow cells (NextSeq 2000, NovaSeq 6000/X), shared lanes, or any study sensitive to low-frequency signals (ctDNA, rare variant calling, low-biomass metagenomics, single-cell). This is standard policy across many cores (URMC, KUMC, Cornell BRC, MIT BMC, MGH CCIB).
Implementation checklist (lab + informatics)
-
Kit & plate selection
-
Choose UDI plates with validated edit distance and balanced base composition (UMass adapter pooling).
-
If designing custom sets, validate with tools like GIL for distance/diversity constraints (GIL).
-
-
Library construction
-
Pooling & run design
-
Balance molarities; avoid extreme under-representation (van der Valk 2020).
-
Use UDI on shared patterned runs (Cornell policy; URMC NovaSeq guidance).
-
-
Demultiplexing
-
Enforce strict UDI pair lists; drop reads with illegal i5/i7 pairings (UMass guide; Southard-Smith 2020).
-
-
QC & reporting
-
Report per-sample illegals/hopped-pair counts, index purity, and contamination matrices (UDI+UMI strategies: MacConaill 2018).
-
For sensitive modalities (e.g., single-cell), consider computational hopping-rate estimation/correction (Miao 2024) and document acceptance thresholds alongside GIAB-based controls (GIAB reference use).
-
Limitations and edge cases
-
Low-diversity libraries (amplicons, small RNA) stress optics/registration and can inflate apparent cross-talk; add spike-ins or pool with diverse libraries (example dual-indexed small-RNA strategies at Vanderbilt).
-
Security/contamination risks external to indexing (e.g., residual DNA on used flow cells) require procedural controls beyond UDI (UW/UC-Washington study).
-
Claims of “no hopping” platforms/kits should still be validated for your assay type and input range; some studies report low hopping under specific chemistries/lab conditions (Li 2019), but generalize with caution.
Quick decision matrix
| Study type | Platform | Recommended indexing | Notes |
|---|---|---|---|
| Clinical-grade/large cohort WGS/WES | Patterned flow cell (NovaSeq/NextSeq 2000/X) | UDI | Required/standard in many cores; strict demux; GIAB controls (URMC, KUMC, UMass, FDA eval). |
| Low-biomass metagenomics / aDNA | Patterned | UDI (+UMI) | Balance pools; quantify hopping; discard illegals; see van der Valk 2020 and MacConaill 2018. |
| Single-cell (scRNA-seq) | Patterned | UDI | Consider computational correction (Miao 2024); vendor SOPs emphasize adapter cleanup (e.g., CCR/NIH protocol extract). |
| Amplicon panels (high plex) | Any | UDI preferred; CDI possible | Strict cleanup; balanced indexes; see MSU RTSF guide and UC Davis amplicon FAQ. |
| Small pilot / teaching runs | Non-patterned or MiSeq | CDI or single (i7) | Accept higher cross-talk risk; follow best practices; see UC Davis. |
Key references and further reading (edu/gov)
-
UC Davis DNA Tech Core on preventing hopping: link
-
UC Davis amplicon sequencing & indexing: link
-
MSU RTSF technical note on barcode misassignment: link
-
UMass Chan sequencing & adapter pooling guides: guide, adapter PDF
-
KUMC best practices & guidelines: best practices, guidelines
-
URMC Genomics Center policy pages: FAQ, instrumentation
-
Cornell BRC on UDIs and indexing reactions: link
-
MIT BioMicro Center indexing guidance: link
-
FDA evaluation of library methods (notes on hopping): link
-
NIST/NIH Genome in a Bottle reference for benchmarking: link
-
Studies on hopping, UDIs, and remediation: Costello 2018, MacConaill 2018, van der Valk 2020, Southard-Smith 2020, Miao 2024.
Bottom line
-
Use UDI by default for modern high-throughput sequencing, especially on patterned flow cells.
-
Design and clean: high-edit-distance index sets, balanced base composition, and aggressive removal of free primers/adapters are non-negotiable.
-
Demultiplex strictly: reject illegal pairs; quantify cross-talk; use reference materials for routine benchmarking.
If you’d like, I can tailor this into a product/kit landing page (SEO-oriented headings, FAQs, and schema.org JSON-LD) and a checklist image for your catalog.

