Reading

Sequencing
Decoding Identifiers
Error Handling
Targeted Reading
Sequencing Cost & Throughput Trends

Reading DNA data involves sequencing stored molecules and reconstructing the digital information they represent. Unlike electronic media where data is accessed by reading magnetic or optical states, DNA is decoded through established sequencing technologies.

Sequencing

Modern sequencing platforms process millions of DNA molecules in parallel. Each read reveals the sequence of components that make up an identifier. Because identifiers are assembled from prefabricated building blocks, decoding reduces to detecting which components are present rather than resolving arbitrary long sequences. This improves accuracy and reduces cost.

Decoding Identifiers

Sequence the library

The sequencer reads component combinations across the identifier pool.

Map to rank

Every identifier corresponds to a rank within the combinatorial address space. Components present → identifier rank determined.

Reconstruct codewords

Each rank maps back to its assigned bit position in the original codeword. Reconstructing all codewords recovers the full digital dataset.

Error Handling

No physical process is perfect. Sequencing may introduce errors xDNA addresses this at multiple levels:

Redundant copies

Multiple copies of each identifier are written. Statistical thresholds distinguish true signals from noise.

Error correction codes

Reed-Solomon and similar schemes correct residual errors after decoding. Demonstrated raw error rates are well within correction limits.

Targeted enrichment

Identifiers are chemically isolated by component before sequencing only relevant subsets need to be read, reducing noise at the source.

Targeted Reading

Unlike tape or disk where full scans are often required, DNA allows selective enrichment before sequencing. Identifiers can be isolated chemically according to their components, so only relevant subsets of the archive need to be read. This dramatically reduces cost and time for data retrieval.

Sequencing Cost & Throughput Trends

DNA sequencing costs have followed an exponential decline for two decades.

The cost to read one million bases has dropped more than 100,000-fold since 2001 consistently outpacing Moore’s Law in computing. A single modern nanopore sequencer generates hundreds of millions of reads in under two days. New high-throughput platforms are expected to reach terabase-per-day output.

These improvements mean archives written today will become increasingly fast and economical to access in the future the inverse of every conventional storage medium, which becomes harder to read over time.

Writing Storing

⌘I

Overview

Why DNA

Industry Impact

Token

Sequencing

Decoding Identifiers

Error Handling

Redundant copies

Error correction codes

Targeted enrichment

Targeted Reading

Sequencing Cost & Throughput Trends

Overview

Why DNA

Industry Impact

Token

Documentation Index

​Sequencing

​Decoding Identifiers

​Error Handling

Redundant copies

Error correction codes

Targeted enrichment

​Targeted Reading

​Sequencing Cost & Throughput Trends

Sequencing

Decoding Identifiers

Error Handling

Targeted Reading

Sequencing Cost & Throughput Trends