🌐 AI搜索 & 代理 主页
Skip to content

Commit 0f69bed

Browse files
committed
For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v14 - v17. This is a back-patch of commits: - 243e9b4 (main change, on master, before v18 branched) - 0bada39 (defect fix, on master, before v18 branched) - bae8ca8 (cosmetics from post-commit review, on REL_18_STABLE) It reverses commit c1099dd, my revert of the original back-patch of 243e9b4. This back-patch omits the non-comment heap_decode() changes. I find those changes removed harmless code that was last necessary in v13. See discussion thread for details. The back branches aren't the place to remove such code. Like the original back-patch, this doesn't change WAL, because these branches use end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Expect ".abi-compliance-history" edits to follow. Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions) Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions) Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com Backpatch-through: 14-17
1 parent b8cfe9d commit 0f69bed

File tree

14 files changed

+362
-131
lines changed

14 files changed

+362
-131
lines changed

src/backend/access/heap/README.tuplock

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,3 +201,35 @@ wider than four bytes, and current readers don't need consistency across
201201
fields. Hence, they get by with just fetching each field once. XXX such a
202202
caller may also read a value that has not reached WAL; see
203203
systable_inplace_update_finish().
204+
205+
During logical decoding, caches reflect an inplace update no later than the
206+
next XLOG_XACT_INVALIDATIONS. That record witnesses the end of a command.
207+
Tuples of its cmin are then visible to decoding, as are inplace updates of any
208+
lower LSN. Inplace updates of a higher LSN may also be visible, even if those
209+
updates would have been invisible to a non-historic snapshot matching
210+
decoding's historic snapshot. (In other words, decoding may see inplace
211+
updates that were not visible to a similar snapshot taken during original
212+
transaction processing.) That's a consequence of inplace update violating
213+
MVCC: there are no snapshot-specific versions of inplace-updated values. This
214+
all makes it hard to reason about inplace-updated column reads during logical
215+
decoding, but the behavior does suffice for relhasindex. A relhasindex=t in
216+
CREATE INDEX becomes visible no later than the new pg_index row. While it may
217+
be visible earlier, that's harmless. Finding zero indexes despite
218+
relhasindex=t is normal in more cases than this, e.g. after DROP INDEX.
219+
Example of a case that meaningfully reacts to the inplace inval:
220+
221+
CREATE TABLE cat (c int) WITH (user_catalog_table = true);
222+
CREATE TABLE normal (d int);
223+
...
224+
CREATE INDEX ON cat (c)\; INSERT INTO normal VALUES (1);
225+
226+
If the output plugin reads "cat" during decoding of the INSERT, it's fair to
227+
want that read to see relhasindex=t and use the new index.
228+
229+
An alternative would be to have decoding of XLOG_HEAP_INPLACE immediately
230+
execute its invals. That would behave more like invals during original
231+
transaction processing. It would remove the decoding-specific delay in e.g. a
232+
decoding plugin witnessing a relfrozenxid change. However, a good use case
233+
for that is unlikely, since the plugin would still witness relfrozenxid
234+
changes prematurely. Hence, inplace update takes the trivial approach of
235+
delegating to XLOG_XACT_INVALIDATIONS.

src/backend/access/heap/heapam.c

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6320,6 +6320,19 @@ heap_inplace_lock(Relation relation,
63206320

63216321
Assert(BufferIsValid(buffer));
63226322

6323+
/*
6324+
* Register shared cache invals if necessary. Other sessions may finish
6325+
* inplace updates of this tuple between this step and LockTuple(). Since
6326+
* inplace updates don't change cache keys, that's harmless.
6327+
*
6328+
* While it's tempting to register invals only after confirming we can
6329+
* return true, the following obstacle precludes reordering steps that
6330+
* way. Registering invals might reach a CatalogCacheInitializeCache()
6331+
* that locks "buffer". That would hang indefinitely if running after our
6332+
* own LockBuffer(). Hence, we must register invals before LockBuffer().
6333+
*/
6334+
CacheInvalidateHeapTupleInplace(relation, oldtup_ptr);
6335+
63236336
LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
63246337
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
63256338

@@ -6415,6 +6428,7 @@ heap_inplace_lock(Relation relation,
64156428
if (!ret)
64166429
{
64176430
UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
6431+
ForgetInplace_Inval();
64186432
InvalidateCatalogSnapshot();
64196433
}
64206434
return ret;
@@ -6443,6 +6457,16 @@ heap_inplace_update_and_unlock(Relation relation,
64436457
if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
64446458
elog(ERROR, "wrong tuple length");
64456459

6460+
/*
6461+
* Unlink relcache init files as needed. If unlinking, acquire
6462+
* RelCacheInitLock until after associated invalidations. By doing this
6463+
* in advance, if we checkpoint and then crash between inplace
6464+
* XLogInsert() and inval, we don't rely on StartupXLOG() ->
6465+
* RelationCacheInitFileRemove(). That uses elevel==LOG, so replay would
6466+
* neglect to PANIC on EIO.
6467+
*/
6468+
PreInplace_Inval();
6469+
64466470
/* NO EREPORT(ERROR) from here till changes are logged */
64476471
START_CRIT_SECTION();
64486472

@@ -6486,17 +6510,24 @@ heap_inplace_update_and_unlock(Relation relation,
64866510
PageSetLSN(BufferGetPage(buffer), recptr);
64876511
}
64886512

6513+
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
6514+
6515+
/*
6516+
* Send invalidations to shared queue. SearchSysCacheLocked1() assumes we
6517+
* do this before UnlockTuple().
6518+
*/
6519+
AtInplace_Inval();
6520+
64896521
END_CRIT_SECTION();
6522+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
64906523

6491-
heap_inplace_unlock(relation, oldtup, buffer);
6524+
AcceptInvalidationMessages(); /* local processing of just-sent inval */
64926525

64936526
/*
6494-
* Send out shared cache inval if necessary. Note that because we only
6495-
* pass the new version of the tuple, this mustn't be used for any
6496-
* operations that could change catcache lookup keys. But we aren't
6497-
* bothering with index updates either, so that's true a fortiori.
6498-
*
6499-
* XXX ROLLBACK discards the invalidation. See test inplace-inval.spec.
6527+
* Queue a transactional inval, for logical decoding and for third-party
6528+
* code that might have been relying on it since long before inplace
6529+
* update adopted immediate invalidation. See README.tuplock section
6530+
* "Reading inplace-updated columns" for logical decoding details.
65006531
*/
65016532
if (!IsBootstrapProcessingMode())
65026533
CacheInvalidateHeapTuple(relation, tuple, NULL);
@@ -6511,6 +6542,7 @@ heap_inplace_unlock(Relation relation,
65116542
{
65126543
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
65136544
UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
6545+
ForgetInplace_Inval();
65146546
}
65156547

65166548
/*

src/backend/access/transam/xact.c

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1358,14 +1358,24 @@ RecordTransactionCommit(void)
13581358

13591359
/*
13601360
* Transactions without an assigned xid can contain invalidation
1361-
* messages (e.g. explicit relcache invalidations or catcache
1362-
* invalidations for inplace updates); standbys need to process those.
1363-
* We can't emit a commit record without an xid, and we don't want to
1364-
* force assigning an xid, because that'd be problematic for e.g.
1365-
* vacuum. Hence we emit a bespoke record for the invalidations. We
1366-
* don't want to use that in case a commit record is emitted, so they
1367-
* happen synchronously with commits (besides not wanting to emit more
1368-
* WAL records).
1361+
* messages. While inplace updates do this, this is not known to be
1362+
* necessary; see comment at inplace CacheInvalidateHeapTuple().
1363+
* Extensions might still rely on this capability, and standbys may
1364+
* need to process those invals. We can't emit a commit record
1365+
* without an xid, and we don't want to force assigning an xid,
1366+
* because that'd be problematic for e.g. vacuum. Hence we emit a
1367+
* bespoke record for the invalidations. We don't want to use that in
1368+
* case a commit record is emitted, so they happen synchronously with
1369+
* commits (besides not wanting to emit more WAL records).
1370+
*
1371+
* XXX Every known use of this capability is a defect. Since an XID
1372+
* isn't controlling visibility of the change that prompted invals,
1373+
* other sessions need the inval even if this transactions aborts.
1374+
*
1375+
* ON COMMIT DELETE ROWS does a nontransactional index_build(), which
1376+
* queues a relcache inval, including in transactions without an xid
1377+
* that had read the (empty) table. Standbys don't need any ON COMMIT
1378+
* DELETE ROWS invals, but we've not done the work to withhold them.
13691379
*/
13701380
if (nmsgs != 0)
13711381
{

src/backend/catalog/index.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2905,12 +2905,19 @@ index_update_stats(Relation rel,
29052905
if (dirty)
29062906
{
29072907
systable_inplace_update_finish(state, tuple);
2908-
/* the above sends a cache inval message */
2908+
/* the above sends transactional and immediate cache inval messages */
29092909
}
29102910
else
29112911
{
29122912
systable_inplace_update_cancel(state);
2913-
/* no need to change tuple, but force relcache inval anyway */
2913+
2914+
/*
2915+
* While we didn't change relhasindex, CREATE INDEX needs a
2916+
* transactional inval for when the new index's catalog rows become
2917+
* visible. Other CREATE INDEX and REINDEX code happens to also queue
2918+
* this inval, but keep this in case rare callers rely on this part of
2919+
* our API contract.
2920+
*/
29142921
CacheInvalidateRelcacheByTuple(tuple);
29152922
}
29162923

src/backend/commands/event_trigger.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -979,11 +979,6 @@ EventTriggerOnLogin(void)
979979
* this instead of regular updates serves two purposes. First,
980980
* that avoids possible waiting on the row-level lock. Second,
981981
* that avoids dealing with TOAST.
982-
*
983-
* Changes made by inplace update may be lost due to
984-
* concurrent normal updates; see inplace-inval.spec. However,
985-
* we are OK with that. The subsequent connections will still
986-
* have a chance to set "dathasloginevt" to false.
987982
*/
988983
systable_inplace_update_finish(state, tuple);
989984
}

src/backend/replication/logical/decode.c

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -521,20 +521,13 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
521521
/*
522522
* Inplace updates are only ever performed on catalog tuples and
523523
* can, per definition, not change tuple visibility. Since we
524-
* don't decode catalog tuples, we're not interested in the
524+
* also don't decode catalog tuples, we're not interested in the
525525
* record's contents.
526-
*
527-
* In-place updates can be used either by XID-bearing transactions
528-
* (e.g. in CREATE INDEX CONCURRENTLY) or by XID-less
529-
* transactions (e.g. VACUUM). In the former case, the commit
530-
* record will include cache invalidations, so we mark the
531-
* transaction as catalog modifying here. Currently that's
532-
* redundant because the commit will do that as well, but once we
533-
* support decoding in-progress relations, this will be important.
534526
*/
535527
if (!TransactionIdIsValid(xid))
536528
break;
537529

530+
/* PostgreSQL 13 was the last to need these actions. */
538531
(void) SnapBuildProcessChange(builder, xid, buf->origptr);
539532
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
540533
break;

src/backend/utils/cache/catcache.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2356,7 +2356,8 @@ void
23562356
PrepareToInvalidateCacheTuple(Relation relation,
23572357
HeapTuple tuple,
23582358
HeapTuple newtuple,
2359-
void (*function) (int, uint32, Oid))
2359+
void (*function) (int, uint32, Oid, void *),
2360+
void *context)
23602361
{
23612362
slist_iter iter;
23622363
Oid reloid;
@@ -2397,7 +2398,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
23972398
hashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, tuple);
23982399
dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
23992400

2400-
(*function) (ccp->id, hashvalue, dbid);
2401+
(*function) (ccp->id, hashvalue, dbid, context);
24012402

24022403
if (newtuple)
24032404
{
@@ -2406,7 +2407,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
24062407
newhashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, newtuple);
24072408

24082409
if (newhashvalue != hashvalue)
2409-
(*function) (ccp->id, newhashvalue, dbid);
2410+
(*function) (ccp->id, newhashvalue, dbid, context);
24102411
}
24112412
}
24122413
}

0 commit comments

Comments
 (0)