Document store design for InfraTraq — 35 collections across 10 categories covering audit logs, API integration, activity tracking, messaging, notifications, caches, user preferences, AI/ML, system observability, and file attachments.
InfraTraq uses a dual-database architecture: PostgreSQL for transactional/relational data (398 tables) and MongoDB for event logs, caches, flexible documents, and high-write workloads (35 collections). The decision framework is simple: if the data requires foreign key constraints, ACID transactions, or complex JOINs, it goes to PostgreSQL. If the data is append-only, has variable schema, needs TTL expiry, or is write-heavy with eventual consistency tolerance, it goes to MongoDB.
| Status | Description | Allowed Actions | Next States |
|---|---|---|---|
| Active | Collection is in production use | Read/Write enabled | Archived |
| TTL Expiring | Documents approaching TTL boundary | Monitor, extend if needed | Active, Archived |
| Archived | Collection data moved to cold storage | Read-only queries | Deleted |
| Maintenance | Collection under reindex or migration | Admin operations only | Active |
Foreign key constraints required • ACID transactions needed • Complex JOINs across entities • Strict schema enforcement • Financial/accounting data • Master data with referential integrity
Schema varies per record • Append-only write pattern • TTL auto-expiry needed • Nested/array fields per document • High-write throughput (logs, events) • Flexible JSON payloads
Store the entity_type + entity_id reference in MongoDB • Look up relational details in PostgreSQL • Use CDC (Change Data Capture) for sync • Materialized views for cross-DB reporting
Never store financial transactions in MongoDB • Never build JOINs across MongoDB collections • Never use MongoDB for approval chains needing FK integrity • Never skip TTL for high-volume logs
entity_changelog — Full before/after snapshots on every RDBMS write — TTL: 2-7 yearsfield_audit_log — Field-level change tracking for sensitive fields — TTL: 2-7 yearsapi_request_log — Inbound/outbound HTTP request + response payloads — TTL: 30-90 dayswebhook_event_log — Webhook delivery attempts and responses — TTL: 30-90 daysintegration_sync_log — Sync job execution details per external system — TTL: 90 daysuser_activity_event — Clickstream: page views, button clicks, time-on-page — TTL: 30-365 dayssearch_query_log — Search terms, filters, results count, click-through — TTL: 90 daysfeature_usage_metric — Feature adoption tracking per user/module — TTL: 365 dayschat_message — Real-time chat messages with nested attachments — Permanentcomment_thread — Polymorphic comments on any entity (entity_type + entity_id) — Permanentemail_communication — Email send/receive history with parsed body/attachments — Permanentnotification_delivery — Per-recipient delivery status: sent/delivered/read/failed — TTL: 90 dayspush_notification_log — Mobile push delivery with device/token details — TTL: 90 dayssms_delivery_log — SMS gateway response tracking — TTL: 90 daysreport_cache — Rendered report output cached by parameter hash — TTL: 1hr-30 daysdashboard_snapshot — Pre-aggregated KPI data for dashboard tiles — TTL: 24hranalytics_aggregate — Pre-computed rollups (daily/weekly/monthly) — TTL: 30 dayssearch_index_cache — Full-text search index for instant search — TTL: 7 daysuser_preference — Per-user settings (locale, timezone, default project) — Permanentdashboard_layout — Widget positions, sizes, data sources per user — Permanentuser_favorite — Pinned reports, bookmarked entities, recent items — Permanentuser_alert_config — Custom alert thresholds and notification preferences — Permanentml_training_dataset — Feature arrays for model training — TTL: 365 daysml_model_registry — Model versions, hyperparameters, metrics — TTL: 365 daysml_prediction_log — Input features → output predictions with confidence — TTL: 90 daysnlp_query_history — Parsed intents, extracted entities from NL queries — TTL: 90 daysai_conversation — Multi-turn AI assistant conversation context — TTL: 90 daysperformance_metric — Response times, throughput, error rates per endpoint — TTL: 7-90 daysbackground_job_log — Job execution details: params, duration, output, errors — TTL: 30 dayssystem_health_event — CPU/memory/disk alerts, service health checks — TTL: 30 dayserror_log — Structured error logs with stack traces and context — TTL: 90 daysattachment_metadata — Files attached to any entity_type + entity_id (polymorphic) — PermanentDesign each collection with its document schema, key fields, and expected growth pattern. Use the decision framework: FK needed → RDBMS; variable schema → MongoDB; append-only → MongoDB; TTL needed → MongoDB.
Create compound indexes on frequently queried field combinations. Use TTL indexes for automatic document expiry. Partial indexes for sparse fields. Text indexes for search_index_cache.
-- TTL index example db.api_request_log.createIndex( { "created_at": 1 }, { expireAfterSeconds: 7776000 } -- 90 days ) -- Compound index for entity lookup db.entity_changelog.createIndex( { "entity_type": 1, "entity_id": 1, "changed_at": -1 } )
Application events flow through the Decision Router. RDBMS writes trigger CDC events that create entity_changelog documents. API calls log to api_request_log. User actions log to user_activity_event. All writes are fire-and-forget for performance.
Dashboard queries hit dashboard_snapshot and analytics_aggregate for pre-computed data. Audit queries scan entity_changelog with compound index. Search uses search_index_cache with text indexes. Report generation checks report_cache before computing.
Configure TTL per collection based on data retention policy. Audit logs: 2-7 years (regulatory). API logs: 30-90 days (debugging). Caches: 1hr-30 days (freshness). System metrics: 7-90 days (capacity planning). Archive to S3/GCS before TTL expiry for cold storage.
Monitor collection sizes, index usage, slow queries, and replication lag. Alert on: collection size > threshold, TTL backlog, write concern failures, index miss rate > 5%, oplog window < 24hr.
# After any RDBMS write, log the change to MongoDB async def log_entity_change(entity_type, entity_id, operation, old_val, new_val, user_id): doc = { "entity_type": entity_type, "entity_id": str(entity_id), "operation": operation, # INSERT, UPDATE, DELETE "old_value": old_val, "new_value": new_val, "changed_by": str(user_id), "changed_at": datetime.utcnow(), "ip_address": request.client.host, "session_id": request.state.session_id } await db.entity_changelog.insert_one(doc)
# Check cache before generating expensive report async def get_report(report_type, params): cache_key = hashlib.sha256( f"{report_type}:{json.dumps(params, sort_keys=True)}".encode() ).hexdigest() cached = await db.report_cache.find_one({"cache_key": cache_key}) if cached and cached["expires_at"] > datetime.utcnow(): return cached["output_data"] # Generate report output = await generate_report(report_type, params) # Cache with TTL await db.report_cache.update_one( {"cache_key": cache_key}, {"$set": {"output_data": output, "expires_at": datetime.utcnow() + timedelta(hours=1)}}, upsert=True ) return output