You don’t need to understand any of the following material to get massive value from Axiom. As a fully managed data platform, Axiom just works. This technical deep-dive is intended for curious minds wondering: Why is Axiom different?
Ingestion architecture
Data flows through a multi-layered ingestion system designed for high throughput and reliability: Regional Edge Layer: HTTPS ingestion requests are received by regional edge proxies positioned to meet data jurisdiction requirements. These proxies handle protocol translation, authentication, and initial data validation. The edge layer supports multiple input formats (JSON, CSV, compressed streams) and can buffer data during downstream issues. High-availability routing: The system provides intelligent routing to healthy database nodes using real-time health monitoring. When primary ingestion paths fail, requests are automatically routed to available nodes or queued in a backlog system that processes data when systems recover. Streaming Pipeline: Raw events are parsed, validated, and transformed in streaming fashion. Field limits and schema validation occur during this phase. Write-Ahead Logging: All ingested data is durably written to a distributed write-ahead log before being processed. This ensures zero data loss even during system failures and supports concurrent writes across multiple ingestion nodes.Storage architecture
Axiom’s storage layer is built around a custom columnar format that achieves extreme compression ratios: Columnar organization: Events are decomposed into columns and stored using specialized encodings optimized for each data type. String columns use dictionary encoding, numeric columns use various compression schemes, and boolean columns use bitmap compression. Block-based storage: Data is organized into immutable blocks that are written once and read many times. Each block contains:- Column metadata and statistics
- Compressed column data in a proprietary format
- Separate time indexes for temporal queries
- Field schemas and type information
- Ingestion compression: Real-time compression during ingestion (25-50% reduction)
- Block compression: Columnar compression within storage blocks (10-20x additional compression)
- Compaction compression: Background compaction further optimizes storage (additional 2-5x compression)
Query architecture
Axiom executes queries using a serverless architecture that spins up compute resources on-demand: Query compilation: The APL (Axiom Processing Language) query is parsed, optimized, and compiled into an execution plan. The compiler performs predicate pushdown, projection optimization, and identifies which blocks need to be read. Serverless Workers: Query execution occurs in ephemeral workers optimized through “Fusion queries”—a system that runs parallel queries inside a single worker to reduce costs and leave more resources for large queries. Workers download only the necessary column data from object storage, enabling efficient resource utilization. Multiple workers can process different blocks in parallel. Block-level parallelism: Each query spawns multiple workers that process different blocks concurrently. Workers read compressed column data directly from object storage, decompress it in memory, and execute the query. Result aggregation: Worker results are streamed back and aggregated by a coordinator process. Large result sets are automatically spilled to object storage and streamed to clients via signed URLs. Intelligent caching: Query results are cached in object storage with intelligent cache keys that account for time ranges and query patterns. Cache hits dramatically reduce query latency for repeated queries.Compaction system
A background compaction system continuously optimizes storage efficiency: Automatic compaction: The compaction scheduler identifies blocks that can be merged based on size, age, and access patterns. Small blocks are combined into larger “superblocks” that provide better compression ratios and query performance. Multiple strategies: The system supports several compaction algorithms:- Default: General-purpose compaction with optimal compression
- Clustered: Groups data by common field values for better locality
- Fieldspace: Optimizes for specific field access patterns
- Concat: Simple concatenation for append-heavy workloads