🌐 AI搜索 & 代理 主页
Skip to content

Commit 1d7b027

Browse files
committed
For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v14 - v17. This is a back-patch of commits: - 243e9b4 (main change, on master, before v18 branched) - 0bada39 (defect fix, on master, before v18 branched) - bae8ca8 (cosmetics from post-commit review, on REL_18_STABLE) It reverses commit c1099dd, my revert of the original back-patch of 243e9b4. This back-patch omits the non-comment heap_decode() changes. I find those changes removed harmless code that was last necessary in v13. See discussion thread for details. The back branches aren't the place to remove such code. Like the original back-patch, this doesn't change WAL, because these branches use end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Expect ".abi-compliance-history" edits to follow. Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions) Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions) Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com Backpatch-through: 14-17
1 parent ed75434 commit 1d7b027

File tree

13 files changed

+362
-126
lines changed

13 files changed

+362
-126
lines changed

src/backend/access/heap/README.tuplock

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,3 +201,35 @@ wider than four bytes, and current readers don't need consistency across
201201
fields. Hence, they get by with just fetching each field once. XXX such a
202202
caller may also read a value that has not reached WAL; see
203203
systable_inplace_update_finish().
204+
205+
During logical decoding, caches reflect an inplace update no later than the
206+
next XLOG_XACT_INVALIDATIONS. That record witnesses the end of a command.
207+
Tuples of its cmin are then visible to decoding, as are inplace updates of any
208+
lower LSN. Inplace updates of a higher LSN may also be visible, even if those
209+
updates would have been invisible to a non-historic snapshot matching
210+
decoding's historic snapshot. (In other words, decoding may see inplace
211+
updates that were not visible to a similar snapshot taken during original
212+
transaction processing.) That's a consequence of inplace update violating
213+
MVCC: there are no snapshot-specific versions of inplace-updated values. This
214+
all makes it hard to reason about inplace-updated column reads during logical
215+
decoding, but the behavior does suffice for relhasindex. A relhasindex=t in
216+
CREATE INDEX becomes visible no later than the new pg_index row. While it may
217+
be visible earlier, that's harmless. Finding zero indexes despite
218+
relhasindex=t is normal in more cases than this, e.g. after DROP INDEX.
219+
Example of a case that meaningfully reacts to the inplace inval:
220+
221+
CREATE TABLE cat (c int) WITH (user_catalog_table = true);
222+
CREATE TABLE normal (d int);
223+
...
224+
CREATE INDEX ON cat (c)\; INSERT INTO normal VALUES (1);
225+
226+
If the output plugin reads "cat" during decoding of the INSERT, it's fair to
227+
want that read to see relhasindex=t and use the new index.
228+
229+
An alternative would be to have decoding of XLOG_HEAP_INPLACE immediately
230+
execute its invals. That would behave more like invals during original
231+
transaction processing. It would remove the decoding-specific delay in e.g. a
232+
decoding plugin witnessing a relfrozenxid change. However, a good use case
233+
for that is unlikely, since the plugin would still witness relfrozenxid
234+
changes prematurely. Hence, inplace update takes the trivial approach of
235+
delegating to XLOG_XACT_INVALIDATIONS.

src/backend/access/heap/heapam.c

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6158,6 +6158,19 @@ heap_inplace_lock(Relation relation,
61586158

61596159
Assert(BufferIsValid(buffer));
61606160

6161+
/*
6162+
* Register shared cache invals if necessary. Other sessions may finish
6163+
* inplace updates of this tuple between this step and LockTuple(). Since
6164+
* inplace updates don't change cache keys, that's harmless.
6165+
*
6166+
* While it's tempting to register invals only after confirming we can
6167+
* return true, the following obstacle precludes reordering steps that
6168+
* way. Registering invals might reach a CatalogCacheInitializeCache()
6169+
* that locks "buffer". That would hang indefinitely if running after our
6170+
* own LockBuffer(). Hence, we must register invals before LockBuffer().
6171+
*/
6172+
CacheInvalidateHeapTupleInplace(relation, oldtup_ptr);
6173+
61616174
LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
61626175
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
61636176

@@ -6253,6 +6266,7 @@ heap_inplace_lock(Relation relation,
62536266
if (!ret)
62546267
{
62556268
UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
6269+
ForgetInplace_Inval();
62566270
InvalidateCatalogSnapshot();
62576271
}
62586272
return ret;
@@ -6281,6 +6295,16 @@ heap_inplace_update_and_unlock(Relation relation,
62816295
if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
62826296
elog(ERROR, "wrong tuple length");
62836297

6298+
/*
6299+
* Unlink relcache init files as needed. If unlinking, acquire
6300+
* RelCacheInitLock until after associated invalidations. By doing this
6301+
* in advance, if we checkpoint and then crash between inplace
6302+
* XLogInsert() and inval, we don't rely on StartupXLOG() ->
6303+
* RelationCacheInitFileRemove(). That uses elevel==LOG, so replay would
6304+
* neglect to PANIC on EIO.
6305+
*/
6306+
PreInplace_Inval();
6307+
62846308
/* NO EREPORT(ERROR) from here till changes are logged */
62856309
START_CRIT_SECTION();
62866310

@@ -6324,17 +6348,24 @@ heap_inplace_update_and_unlock(Relation relation,
63246348
PageSetLSN(BufferGetPage(buffer), recptr);
63256349
}
63266350

6351+
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
6352+
6353+
/*
6354+
* Send invalidations to shared queue. SearchSysCacheLocked1() assumes we
6355+
* do this before UnlockTuple().
6356+
*/
6357+
AtInplace_Inval();
6358+
63276359
END_CRIT_SECTION();
6360+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
63286361

6329-
heap_inplace_unlock(relation, oldtup, buffer);
6362+
AcceptInvalidationMessages(); /* local processing of just-sent inval */
63306363

63316364
/*
6332-
* Send out shared cache inval if necessary. Note that because we only
6333-
* pass the new version of the tuple, this mustn't be used for any
6334-
* operations that could change catcache lookup keys. But we aren't
6335-
* bothering with index updates either, so that's true a fortiori.
6336-
*
6337-
* XXX ROLLBACK discards the invalidation. See test inplace-inval.spec.
6365+
* Queue a transactional inval, for logical decoding and for third-party
6366+
* code that might have been relying on it since long before inplace
6367+
* update adopted immediate invalidation. See README.tuplock section
6368+
* "Reading inplace-updated columns" for logical decoding details.
63386369
*/
63396370
if (!IsBootstrapProcessingMode())
63406371
CacheInvalidateHeapTuple(relation, tuple, NULL);
@@ -6349,6 +6380,7 @@ heap_inplace_unlock(Relation relation,
63496380
{
63506381
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
63516382
UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
6383+
ForgetInplace_Inval();
63526384
}
63536385

63546386
/*

src/backend/access/transam/xact.c

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1337,14 +1337,24 @@ RecordTransactionCommit(void)
13371337

13381338
/*
13391339
* Transactions without an assigned xid can contain invalidation
1340-
* messages (e.g. explicit relcache invalidations or catcache
1341-
* invalidations for inplace updates); standbys need to process those.
1342-
* We can't emit a commit record without an xid, and we don't want to
1343-
* force assigning an xid, because that'd be problematic for e.g.
1344-
* vacuum. Hence we emit a bespoke record for the invalidations. We
1345-
* don't want to use that in case a commit record is emitted, so they
1346-
* happen synchronously with commits (besides not wanting to emit more
1347-
* WAL records).
1340+
* messages. While inplace updates do this, this is not known to be
1341+
* necessary; see comment at inplace CacheInvalidateHeapTuple().
1342+
* Extensions might still rely on this capability, and standbys may
1343+
* need to process those invals. We can't emit a commit record
1344+
* without an xid, and we don't want to force assigning an xid,
1345+
* because that'd be problematic for e.g. vacuum. Hence we emit a
1346+
* bespoke record for the invalidations. We don't want to use that in
1347+
* case a commit record is emitted, so they happen synchronously with
1348+
* commits (besides not wanting to emit more WAL records).
1349+
*
1350+
* XXX Every known use of this capability is a defect. Since an XID
1351+
* isn't controlling visibility of the change that prompted invals,
1352+
* other sessions need the inval even if this transactions aborts.
1353+
*
1354+
* ON COMMIT DELETE ROWS does a nontransactional index_build(), which
1355+
* queues a relcache inval, including in transactions without an xid
1356+
* that had read the (empty) table. Standbys don't need any ON COMMIT
1357+
* DELETE ROWS invals, but we've not done the work to withhold them.
13481358
*/
13491359
if (nmsgs != 0)
13501360
{

src/backend/catalog/index.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2931,12 +2931,19 @@ index_update_stats(Relation rel,
29312931
if (dirty)
29322932
{
29332933
systable_inplace_update_finish(state, tuple);
2934-
/* the above sends a cache inval message */
2934+
/* the above sends transactional and immediate cache inval messages */
29352935
}
29362936
else
29372937
{
29382938
systable_inplace_update_cancel(state);
2939-
/* no need to change tuple, but force relcache inval anyway */
2939+
2940+
/*
2941+
* While we didn't change relhasindex, CREATE INDEX needs a
2942+
* transactional inval for when the new index's catalog rows become
2943+
* visible. Other CREATE INDEX and REINDEX code happens to also queue
2944+
* this inval, but keep this in case rare callers rely on this part of
2945+
* our API contract.
2946+
*/
29402947
CacheInvalidateRelcacheByTuple(tuple);
29412948
}
29422949

src/backend/replication/logical/decode.c

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -523,20 +523,13 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
523523
/*
524524
* Inplace updates are only ever performed on catalog tuples and
525525
* can, per definition, not change tuple visibility. Since we
526-
* don't decode catalog tuples, we're not interested in the
526+
* also don't decode catalog tuples, we're not interested in the
527527
* record's contents.
528-
*
529-
* In-place updates can be used either by XID-bearing transactions
530-
* (e.g. in CREATE INDEX CONCURRENTLY) or by XID-less
531-
* transactions (e.g. VACUUM). In the former case, the commit
532-
* record will include cache invalidations, so we mark the
533-
* transaction as catalog modifying here. Currently that's
534-
* redundant because the commit will do that as well, but once we
535-
* support decoding in-progress relations, this will be important.
536528
*/
537529
if (!TransactionIdIsValid(xid))
538530
break;
539531

532+
/* PostgreSQL 13 was the last to need these actions. */
540533
(void) SnapBuildProcessChange(builder, xid, buf->origptr);
541534
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
542535
break;

src/backend/utils/cache/catcache.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2278,7 +2278,8 @@ void
22782278
PrepareToInvalidateCacheTuple(Relation relation,
22792279
HeapTuple tuple,
22802280
HeapTuple newtuple,
2281-
void (*function) (int, uint32, Oid))
2281+
void (*function) (int, uint32, Oid, void *),
2282+
void *context)
22822283
{
22832284
slist_iter iter;
22842285
Oid reloid;
@@ -2319,7 +2320,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
23192320
hashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, tuple);
23202321
dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
23212322

2322-
(*function) (ccp->id, hashvalue, dbid);
2323+
(*function) (ccp->id, hashvalue, dbid, context);
23232324

23242325
if (newtuple)
23252326
{
@@ -2328,7 +2329,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
23282329
newhashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, newtuple);
23292330

23302331
if (newhashvalue != hashvalue)
2331-
(*function) (ccp->id, newhashvalue, dbid);
2332+
(*function) (ccp->id, newhashvalue, dbid, context);
23322333
}
23332334
}
23342335
}

0 commit comments

Comments
 (0)