Axiom Codex
The canonical intelligence dataset.
Normalized. Labeled. AI-ready.
Dataset catalog
Eight dataset families.
One normalization standard.
Every record ships with a schema spec, provenance record, and version history. Research tier is free. Commercial licensing starts at $299/dataset/month.
Civic Intelligence
517K recordsCouncil votes, permits, and zoning decisions structured with entity extraction, sentiment scores, upzoning probability, and DAG-mapped approval sequences.
AIS Maritime Positions
1.4M positionsDecoded vessel tracks, port calls, and anchor events enriched with Equasis vessel metadata, flag state, DWT, and kinematic fingerprints.
Urban Signal Grid
454K H3 cellsCell-level ESGI composite scores and 8 signal group subscores across 22 US metros at H3 resolution 8. The most valuable dataset for CRE AI applications.
Axiom Events Timeline
1.7M+ eventsUnified temporal intelligence — permits, council decisions, AIS anomalies, OSHA violations, business openings — normalized to a single event schema with source attribution.
LEHD Commuter Flows
454K OD pairsCensus LEHD worker origin-destination pairs normalized to H3 cells. Income bands, job sector, and Huff gravity index pre-computed.
POI Intelligence
89K locationsPoints of interest enriched with category taxonomy, NAICS codes, pioneer business flags, walk/transit scores, and reviews sample.
Permit Signals
2.1M permitsBuilding permit activity across 22 metros with LLM-extracted scope type, building type, unit count, and estimated cost tier from free-text descriptions.
OSHA Safety Index
500K+ inspectionsOSHA inspection records with NLP-classified hazard categories, violation severity tiers, and inflation-adjusted penalty normalization by H3 cell.
Schema-locked
Versioned schemas (semver). Breaking changes bump major. Consumers pin to a schema version and trust forward compatibility.
Source-attributed
Every record carries source_feed, ingested_at, normalization_version, and confidence_score. No black-box data.
AI-optimized labels
Categories, taxonomies, and derived fields are pre-computed and consistent. Ready as training labels without preprocessing.
Spatial consistency
All geospatial data normalizes to H3 resolution 8. Everything speaks the same spatial language.
Versioned snapshots
Monthly snapshot releases with full changelogs. Point-in-time access for reproducible research and model training.
Cross-dataset joins
Shared keys across all datasets — h3_index, event_id, jurisdiction_slug — so any dataset joins cleanly to any other.
How it fits
One data layer. Three products.
Locus and Overwatch are the application layer — intelligence tuned for specific markets. Codex is the data layer underneath, sold directly to builders who want the raw feed without the lens.
Axiom Locus
CRE location intelligence. Scores, signals, and civic risk — interpreted for real estate decisions.
Axiom Overwatch
Maritime intelligence. Vessel tracks, port congestion, AIS anomalies — for shipping and trade.
Axiom Codexyou are here
The same data — no lens. Canonical, schema-locked, AI-optimized. For builders who want the raw intelligence.
Licensing
Three tiers.
- ✓100K record subset per dataset
- ✓HuggingFace download
- ✓Community support
- ✓Schema spec access
- ✓Full dataset, all records
- ✓Monthly snapshot updates
- ✓R2 signed download URL
- ✓Schema changelog access
- ✓Email support
- ✓All 8 dataset families
- ✓Weekly update cadence
- ✓Custom schema extensions
- ✓Dedicated support + SLA
- ✓Volume discounts