ClickHouse
Unkey generates a lot of usage data, such as API requests, system metrics, verification/ratelimit outcomes and more. Most of it we serve back to the user through our dashboard or API, but we also use this data internally to drive our billing and alerts.
ClickHouse is a perfect fit for us, for it's very high ingest capabilities and easy of querying through SQL.
We're using ClickHouse Cloud offering and run our staging and production clusters in AWS us-east-1.
Working with ClickHouse
For development, we run ClickHouse inside our docker-compose setup and it will be autostarted and migrated for you.
The /internal/clickhouse
package contains both the schema as well as a typescript client for inserting and querying.
All queries/inserts must be defined in this package.
For migrations, we're using goose. This is only really relevant when migrating the staging and production clusters, as it is taken care of automatically during the docker compose setup.
While ClickHouse can ingest millions of rows per second, it can only achieve this with relatively large insert batches of tens of thousands of rows per request. This presents a challenge in our serverless setup, as most of our API invocations would only insert a single row per table. That's why we're using a ClickHouse Proxy for our inserts.
Core Concepts
1. Database Organization
We organize data into logical databases:
- verifications: All data related to key verifications
- ratelimits: Everything about rate limit events
- metrics: API request logs and performance metrics
- billing: Aggregated data used for customer billing
- business: Analytics for internal business metrics
- telemetry: SDK and client usage information
2. Table Types
There are three main types of tables you'll encounter:
- Raw tables: Capture raw events as they happen (e.g.,
raw_key_verifications_v1
) - Aggregated tables: Store pre-computed summaries (e.g.,
key_verifications_per_hour_v3
) - Materialized views: Connect raw and aggregated tables, processing data automatically (e.g.,
key_verifications_per_hour_mv_v3
)
3. Naming Convention
We follow a consistent naming pattern:
raw_<domain>_<description>_v<version>
for raw tables<domain>_<description>_per_<time_unit>_v<version>
for aggregated tables<description>_<aggregation>_mv_v<version>
for materialized views
Key Tables You Should Know About
For Verification Analytics
verifications.raw_key_verifications_v1
: Every individual key verification eventverifications.key_verifications_per_hour_v3
: Hourly summary of verificationsverifications.key_verifications_per_day_v3
: Daily summary of verificationsverifications.key_verifications_per_month_v3
: Monthly summary of verifications
For Rate Limit Analytics
ratelimits.raw_ratelimits_v1
: Individual rate limit checksratelimits.ratelimits_per_hour_v1
: Hourly summary of rate limitsratelimits.ratelimits_last_used_v1
: Tracks when identifiers were last rate limited
For Billing
billing.billable_verifications_per_month_v2
: Monthly count of billable verificationsbilling.billable_ratelimits_per_month_v1
: Monthly count of billable rate limits
How Data Flows
- Collection: When a key is verified or a rate limit is checked, we log an event
- Processing: Materialized views automatically process these events into summaries
- Aggregation: Data is aggregated at different time intervals (hour/day/month)
- Querying: Our API and dashboard query these aggregated tables
Working with the ClickHouse Client
Our TypeScript client in @unkey/clickhouse
makes it easy to interact with ClickHouse:
Common Tasks
How to track key usage over time
How to see recent key verifications
How to query rate limit data
Development Tips
- Local Setup: Docker Compose includes a ClickHouse instance that automatically runs migrations
- Migrations: We use
goose
for schema migrations in/schema
directory - Testing: Use
vitest
to test your ClickHouse queries withClickHouseContainer
- Schema Changes: When changing schema, create a new file in
/schema
with proper versioning
Debugging Queries
If you need to understand or optimize a query:
-
Use the client's query method directly for custom queries:
-
Check performance with
EXPLAIN
:
IaC
Our ClickHouse clusters are fully managed in unkeyed/infra.
Last updated on