Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xdnalabs.com/llms.txt

Use this file to discover all available pages before exploring further.

Artificial intelligence depends on access to massive datasets and the ability to search them efficiently. Current approaches rely on large compute clusters and expensive memory systems. DNA provides an alternative storage and compute layer where embeddings and datasets can be preserved permanently and searched at molecular scale.

Embedding Archives

Modern AI workflows transform data into embeddings high-dimensional vectors representing semantic meaning. Millions or billions of embeddings are generated for language, images, proteins, and blockchain transactions. Storing these in DNA creates a permanent semantic index.
1

Map to sparse codeword

Each embedding is mapped to a sparse codeword with uniform weight.
2

Encode into identifiers

Codewords are encoded into DNA identifiers through combinatorial assembly.
3

Molecular vector database

The archive becomes a molecular vector database capable of similarity search through biochemical operations alone.
Select and quotient provide the primitives for approximate nearest-neighbor search:
  1. A query embedding is converted into a codeword
  2. Select operations enrich identifiers that overlap with its one-bit positions
  3. Quotient aggregates signals from related items
  4. Molecular signal strength correlates with similarity to the query
  5. Only a fraction of molecules need to be sequenced reducing digital compute by orders of magnitude

Hybrid Workflows

DNA search is not a replacement for GPU training or fine-grained ranking. It acts as a first-stage filter:

Stage 1 Molecular recall

Biochemical select and quotient narrow the candidate pool from billions to thousands. Energy and compute cost: near zero.

Stage 2 Digital ranking

Sequence only the enriched subset. Run final ranking and inference digitally on a compact candidate set.
This hybrid architecture reduces compute cost and energy use while scaling to datasets that would otherwise be impractical to keep fully online.

Model Permanence

AI models are increasingly valuable intellectual property yet weights, often hundreds of gigabytes, are stored on fragile media. Encoding weights into DNA provides century-scale preservation. On-chain anchoring guarantees model versions remain auditable and verifiable across time.
A model trained today can be reproduced or audited decades into the future with no dependence on any specific hardware, format, or cloud provider.

Applications

DomainDNA enables
Language modelsStore embeddings of corpora; retrieve documents by semantic similarity
Drug discoveryStore embeddings of chemical libraries and proteins; retrieve candidates by molecular similarity
Blockchain analyticsStore embeddings of transactions or contracts; run similarity queries across historical ledgers
Multi-modal AIPreserve image, video, and genomic embeddings together in a unified molecular archive