Date: 2025-02-10
Accepted
Redundancy is required (vertical scaling - cluster with replication and automatic failover). Sharding is desirable (horizontal scaling across different servers or physical disks).
Special requirement: store and quickly find big data (orders).
Full-text search is not included in the selection criteria - ElasticSearch does that; the data will be mirrored from the main DB via Kafka (Outbox pattern). Business events like ‘product added’, ‘order paid’, needed for audit and business analytics, are not part of this task either - ClickHouse does that; the events will come from Kafka (Outbox pattern).
For robustness: Postgres 17 with Percona cluster.
For performance:
Acceptable (no more than 1 sec.) processing time for typical queries. Redundancy - clustering with replication and automatic (transparent for applications) failover by routing SQL commands the one of the standby servers.
See partitions. In terms of the CAP theorem, this is CA since all partitions (effectively independent tables) are served by a single server. However, they can be accelerated using RAID with striping (e.g., RAID6), which is recommended, or by connecting different disks to different partitions. The number of partitions should be such that each partition contains no more than 50M records:
CREATE TABLE orders
(
id uuid PRIMARY KEY DEFAULT gen_random_uuid()
) PARTITION BY HASH (id);
-- store partitions in a separate schema to avoid cluttering the main namespace
CREATE SCHEMA parts;
CREATE TABLESPACE disk1
LOCATION '/mnt/disk1';
CREATE TABLE parts.orders_part1
PARTITION OF orders
FOR VALUES WITH (MODULUS 2, REMAINDER 0)
TABLESPACE disk1;
CREATE TABLESPACE disk2
LOCATION '/mnt/disk2';
CREATE TABLE parts.orders_part2
PARTITION OF orders
FOR VALUES WITH (MODULUS 2, REMAINDER 1)
TABLESPACE disk2;
See for example Citus.
Search is performed in ElasticSearch, and since products are almost read-only resources, we can consider ElasticSearch as their primary storage. Sharding is supported there, but there are no transactions and writing occurs with delay. In Postgres, mere record IDs remain - for foreign keys and joins.
If the moment of saving a record (order / product) in DB is interpreted as a time series, consider Postgres 17 with TimescaleDB extension. It provides hypertables for automatic partition creation and data compression to reduce disk volume by up to 20 times (compressed data becomes read-only, but the compression moment can be delayed for e.g. 90 days, thus separating the hot storage from the archive).
It’s OK to add tablespaces for storing partitions on different disks (although it is recommended to use one tablespace with RAID striping+mirroring underneath). Multiple tablespaces can be attached to one hypertable; TimescaleDB understands this and will shard:
CREATE TABLESPACE disk1
LOCATION '/mnt/disk1';
CREATE TABLESPACE disk2
LOCATION '/mnt/disk2';
ALTER TABLE orders
SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'id',
timescaledb.compress_orderby = 'created_date'
);
SELECT attach_tablespace('disk1', 'orders');
SELECT attach_tablespace('disk2', 'orders');
SELECT add_compression_policy('orders', INTERVAL '90 days');
See vendor website.
In terms of the CAP theorem, this is CP because during leader node election the cluster is unavailable.
Mongo 8’s license (see section 13) prohibits providing the product as a service (apparently to prevent clouds from reselling the free product), but allows internal use in projects.
For a robust cluster without sharding, 7 nodes are needed: 3 data replicas, 3 cluster metadata replicas, a query router (facade).
For a robust cluster with sharding, 10 nodes are needed: 2 data shards (each shard has 3 replicas), 3 cluster metadata replicas, a query router (facade).