Blog

Apr 22, 2026

Exactly-Once in Spark Structured Streaming: What That Actually Means

"Exactly-once" in Spark Structured Streaming means each input record is processed once by Spark only if the source is replayable and the sink supports idempotent writes or transactions.Spark’s internal checkpointing ensures each micro-batch is executed once, but sinks (or custom write logic) can still introduce duplicates on retries.Using foreachBatch with custom MERGE/upsert logic can produce duplicates unless you detect and skip already-committed batch IDs.Fix: at start of foreachBatch, check the target (Delta transaction log or tracking table) for the batch_id and skip if committed.Checklist: replayable source, durable checkpointing, idempotent sink/writes (or Delta writeStream), and idempotent foreachBatch logic - all four must hold for true end-to-end exactly-once.

Source: HackerNoon →


Share

BTCBTC
$81,123.00
0.14%
ETHETH
$2,294.86
0.81%
USDTUSDT
$1.000
0.01%
BNBBNB
$679.23
2.42%
XRPXRP
$1.45
1.09%
USDCUSDC
$1.000
0.01%
SOLSOL
$95.32
1.18%
TRXTRX
$0.349
0.27%
FIGR_HELOCFIGR_HELOC
$1.04
0.73%
DOGEDOGE
$0.111
0.09%
WBTWBT
$59.61
0.11%
USDSUSDS
$1.000
0%
ADAADA
$0.274
1.69%
ZECZEC
$583.69
5.79%
HYPEHYPE
$40.55
2.01%
LEOLEO
$9.99
1.58%
BCHBCH
$440.05
0.82%
XMRXMR
$413.76
0.08%
LINKLINK
$10.43
0.54%
TONTON
$2.31
2.67%