Blog
6 days ago
The Cost of Correctness in “Real-Time” Systems Like Kafka and Spark
"Real-time" with Kafka and Spark is controlled delay, not instantaneity. Kafka batches for durability; Spark processes streams as micro-batch jobs with watermarks and checkpoints. Exactly-once requires the sink to cooperate. The right question isn't whether a pipeline is real-time it's which latency budget it can sustain while staying correct under failure and late data.
Source: HackerNoon →