Decentralized science requires more than funding mechanisms. It requires verifiable, permanent, and accessible data infrastructure. Today, most scientific knowledge is fragmented across journals, repositories, and institutional servers with limited guarantees of permanence or integrity. DNA provides a molecular substrate for archives that meet the core principles of DeSci: permanence, openness, reproducibility, and verifiability.Documentation Index
Fetch the complete documentation index at: https://docs.xdnalabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Archiving Primary Research Data
Most experiments produce far more data than is published. Genomic reads, proteomic profiles, cryo-EM images, high-throughput screens, and clinical trial datasets reach terabyte and petabyte scales. Conventional repositories struggle to host this data at sustainable cost. Encoding these datasets in DNA provides:Century-scale retention
No migration between media formats. Data remains readable across decades
without active maintenance.
Distributed custody
Compact archives replicated and distributed to multiple custodians for
redundancy and independent access.
On-chain anchoring
Data identifiers anchored on-chain for cryptographic verifiability across
institutions and jurisdictions.
Permanent Literature & Protocols
Scientific publications and laboratory protocols can be written into DNA alongside primary datasets. With only a few grams of DNA, the entire corpus of open-access research can be stored permanently. Unlike proprietary formats, DNA sequences remain universally interpretable today’s discoveries stay readable by future generations.In-DNA Search for Reproducibility
Reproducibility depends on finding relevant prior data. Traditional archives require bulk downloads and brute-force search. In-DNA computing enables associative queries:- A lab can search for experiments involving a given protein sequence or gene without sequencing the entire archive
- Approximate search surfaces related results by embedding similarity, supporting cross-study integration
- Queries can be verified on-chain, proving results correspond to specific molecular archives
AI for Biotech Datasets
DNA archives are not passive. Embeddings from protein language models, molecular dynamics simulations, or clinical feature sets can be stored in DNA. Similarity search using select and quotient retrieves nearest neighbors at the molecular level reducing compute cost for large screening campaigns.This creates a hybrid workflow: DNA performs first-stage recall, digital AI
models handle final ranking and prediction.
