The only open source serverless Lucene implementation.
Self-healing, serverless architecture built for petabyte scale. No migrations and No rewrites. OpenTelemetry compatible.
Trusted by teams at
Log search without the legacy tradeoffs
Run in your own AWS, GCP, or Azure account. Logs never leave your VPC.
Pay for S3 storage and active compute. No per-GB ingestion fees, no over-provisioned clusters.
Scale indexing and query compute independently. Sub-second search across years of retention.
Runs at petabyte scale at Slack and Airbnb. OpenSearch API compatible — drop-in for existing tooling.
| KalDB | Elasticsearch | Splunk / Datadog | |
|---|---|---|---|
| Compute / storage | Decoupled, S3-backed | Coupled | Opaque SaaS |
| Pricing model | Your cloud bill | Per-node licensing | Per-GB ingestion |
| Data sovereignty | In your VPC | Self-host or managed | Vendor cloud |
| License | Apache 2.0 | SSPL / Elastic | Proprietary |
| API | OpenSearch compatible | Native | Proprietary |
KalDB pulls every traditional log-search role onto its own tier — preprocessing, indexing, serving, and storage — so each one scales to its own bottleneck and a spike in one can never stall the others.
Preprocessors validate records, enforce per-service rate limits, and partition data into Kafka. Indexers consume prepared partitions — nothing else.
Stateless indexers own the write path. Dedicated query and cache nodes own the read path. An ingest spike cannot back up dashboards, and a heavy query cannot stall Kafka.
Closed Lucene chunks are uploaded to S3; in-flight logs sit durably in Kafka. Every compute tier is stateless — scale up for peak, scale to zero off-peak.
Recent data lives on indexers. Cache nodes hydrate a configurable window from S3 to local disk. Everything older stays cheaply in S3 for long retention, not on hot nodes.
HTTP ingest lands on preprocessor nodes that validate records, enforce per-service rate limits, and partition messages into Kafka.
Kafka absorbs ingest spikes so producers and indexers scale independently and no fresh data is lost while chunks are still being built.
Indexers consume assigned Kafka partitions and build a Lucene index on local disk. The active chunk is open for both reads and writes while fresh logs stream in.
Once a chunk hits its size or offset limit it closes, is uploaded to S3, and its metadata is published so it can be assigned to the cache tier.
If an indexer falls behind, recovery nodes pick up skipped offsets so fresh ingest stays prioritized and historical gaps are filled separately.
Grafana and other clients hit a stateless query service that speaks the OpenSearch API over HTTP — no client rewrites.
Query nodes fan out to live indexers for recent data and to cache nodes for read-only snapshots already in object storage.
Cache nodes download Lucene chunks from S3 to local disk and serve a configurable retention window — not every byte you ever stored.
The manager tracks snapshots, replicas, cache slot assignments, and retention so each query quickly finds the right chunks to search.
Query nodes merge responses across indexer and cache tiers so users get a single result set even though storage and compute are fully disaggregated.
Conference presentations on KalDB's architecture and operations
Monitorama 2022 — Suman Karumuri
Strange Loop 2022 — Suman Karumuri
Berlin Buzzwords 2023 — Suman Karumuri
SREcon23 Asia/Pacific — Suman Karumuri
Proven at scale across industries
Deploy in your own cloud account with full control. Meet compliance requirements without compromising on features.
Handle millions of events per second. Correlate logs, metrics, and traces across your entire infrastructure.
Detect threats and investigate incidents in real-time. Retain security logs for years at minimal cost.
Self-host KalDB with Docker or run in your cloud