A practical map for faster system-design decisions
Intro
The database landscape has exploded—in a good way. Instead of hunting for a single “best” engine, modern teams compose the right categories for the job: transactional integrity, low-latency reads, high-throughput writes, full-text search, batch analytics, and durable object storage. This post summarizes the trade-space I use in design reviews and includes a one-page visual you can share with your team.The Category Map (quick reference)
- Relational (RDBMS) — Strong consistency, complex queries, auditability.
Examples: PostgreSQL, MySQL, Aurora (AWS), SQL Server, Oracle. - NoSQL: Key-Value & Wide Column — Massive scale, predictable keys, high write throughput.
Examples: DynamoDB, Cosmos DB, FoundationDB; Cassandra/ScyllaDB/HBase; Bigtable/InfluxDB/VictoriaMetrics. - Document Stores — Flexible schema, nested JSON, fast iteration.
Examples: MongoDB, Couchbase, Amazon DocumentDB; Firebase Firestore, Realm; CouchDB/PouchDB; RavenDB/ArangoDB. - Caches & In-Memory — Ultra-low latency, pub/sub, rate-limiting.
Examples: Redis, Memcached, Azure Cache for Redis, Hazelcast, Apache Ignite, NCache. - Search & Real-Time Analytics — Full-text search, fast aggregations, time-series dashboards.
Examples: Elasticsearch/OpenSearch, ClickHouse, Apache Druid, Typesense, Meilisearch. - Data Warehouses & Lakes — Petabyte-scale analytics, BI, ELT/ETL, lakehouse.
Examples: Redshift/BigQuery/Snowflake; Athena/Trino(Presto)/Starburst; Databricks, Delta Lake, Apache Iceberg. - Object Storage (Blob) — Cheap, durable storage for media, logs, ML artifacts, data lakes.
Examples: Amazon S3, Google Cloud Storage, Azure Blob Storage; MinIO/Ceph/SeaweedFS; Wasabi/Backblaze B2/Cloudflare R2.
How to Choose Quickly (three prompts)
-
Access pattern first: keys, reads vs writes, latency, joins/aggregations, query flexibility.
-
Failure and scale model: single-region vs multi-region, consistency needs, hot keys, TTLs.
-
Ops reality: managed vs self-hosted, cost model (throughput vs storage vs egress), team skills.
Common, Proven Combos
Low-latency read API: KV (DynamoDB/Cassandra) + Redis cache + object storage for blobs.
-
Search-heavy product: RDBMS or document for source-of-truth + Elasticsearch/OpenSearch for text + Redis for hot paths.
-
Analytics stack: Object storage (S3/GCS/Blob) + warehouse/lakehouse (Redshift/BigQuery/Snowflake/Databricks) + Trino/Athena for ad-hoc.
Few Pitfalls to Avoid
One size fits all: forcing a single DB to do everything increases cost and fragility.
-
Ignoring cache invalidation: fast until stale—then expensive. Design TTL/keys up front.
-
Unbounded growth: plan TTL/archival to object storage; watch hot partitions and fan-out.
-
Ops mismatch: great tech, wrong team fit—prefer managed where it accelerates delivery.
Closing
The win in 2025 isn’t a silver bullet database—it’s the ability to compose the right categories for each workload and evolve the stack as needs change. I’ll follow up with short decision flows for specific scenarios.
Which scenario should I tackle first? Comment on LinkedIn.
No comments:
Post a Comment