-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-140374: Add glossary entries related to multithreading #140375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure others will have opinions but here are my takes...
willingc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysnikolaou @ngoldbaum I left some comments. This is challenging to write since there is a balance between brevity and the desire to cover too much. Let's focus first on the common definition and second on any free-threading specifics. If it is difficult to add free-threading specifics for any term, let's cover that in a separate doc.
@-mention me when you want another review. Thanks.
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope you don't mind a few more suggestions that occurred to me as I was giving this another once-over. This is much closer to being ready than when I last looked and adding all these entries will be a huge improvement over the status quo. Definitely worth backporting to the 3.14 docs ASAP!
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think adding multithreading related entries will be helfpul
| the :term:`cyclic garbage collector <garbage collection>` is to identify these groups and break the reference | ||
| cycles so that the memory can be reclaimed. | ||
|
|
||
| data race |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only talking about native code (C or C++ or Rust or whatever), maybe we should leave this out of the glossary for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure about this. The fact that a program might be crashing because of a data race in native code affects Python programmers, so maybe it warrants at least an entry in the glossary. It'll also be nice to be able to link to this glossary entry from other places where we talk about data races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea we had was for ecosystem documentation to link back here. In that context it'll really help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both data race and race condition feels a bit redundant. Wouldn't ecosystem libraries be able to only refer to race conditions?
Am I guessing correctly that here you want to distinguish between races that produce errors in application logic and races that produce a segfault or something low-level like that? If that's the case it might be useful when talking about bug reports, yes. But it might be a bit niche, hopefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extension modules can cause data races, so I feel it's appropriate to talk about both, especially since data races are undefined behavior.
|
If people could give this another look to move it forward, that'd be great! |
willingc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysnikolaou @colesbury I've just done another pass through the glossary and made a few suggestions.
| the :term:`cyclic garbage collector <garbage collection>` is to identify these groups and break the reference | ||
| cycles so that the memory can be reclaimed. | ||
|
|
||
| data race |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea we had was for ecosystem documentation to link back here. In that context it'll really help.
Doc/glossary.rst
Outdated
| :term:`data races <data race>`. In the | ||
| :term:`free-threaded <free threading>` build, :term:`per-module state` | ||
| is often preferred over global state for C extension modules. | ||
| See also :term:`per-module state`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth bringing up context variables as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add an item for thread-local state instead?
| A module, function, or class that behaves correctly when used by multiple | ||
| threads concurrently. Thread-safe code uses appropriate | ||
| :term:`synchronization primitives <synchronization primitive>` like | ||
| :term:`locks <lock>` to protect shared mutable state, or is designed | ||
| to avoid shared mutable state entirely. In the | ||
| :term:`free-threaded <free threading>` build, built-in types like | ||
| :class:`dict`, :class:`list`, and :class:`set` use internal locking | ||
| to make many operations thread-safe, although thread safety is not | ||
| necessarily guaranteed. Code that is not thread-safe may experience | ||
| :term:`race conditions <race condition>` and :term:`data races <data race>` | ||
| when used in multi-threaded programs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one's very tricky, definitely because thread-safe in itself is very vague.
I took a look at my uni textbook on concurrency and couldn't find a definition of thread-safe. (I checked, it actually never mentions thread-safe or thread safe.) And I think that's because what thread-safe means fundamentally depends on the semantics of an application.
Is it thread-safe to call my_list[0] += 1, assuming my_list is shared? It depends. And that's why there is this unsatisfactory line here:
[...] built-in types like
:class:`dict`, :class:`list`, and :class:`set` use internal locking
to make many operations thread-safe, although thread safety is not
necessarily guaranteed.Maybe we can turn the problem around and only define thread-safety as the absence of thread-unsafety?
| A module, function, or class that behaves correctly when used by multiple | |
| threads concurrently. Thread-safe code uses appropriate | |
| :term:`synchronization primitives <synchronization primitive>` like | |
| :term:`locks <lock>` to protect shared mutable state, or is designed | |
| to avoid shared mutable state entirely. In the | |
| :term:`free-threaded <free threading>` build, built-in types like | |
| :class:`dict`, :class:`list`, and :class:`set` use internal locking | |
| to make many operations thread-safe, although thread safety is not | |
| necessarily guaranteed. Code that is not thread-safe may experience | |
| :term:`race conditions <race condition>` and :term:`data races <data race>` | |
| when used in multi-threaded programs. | |
| A module, function, or class that behaves correctly when used by multiple | |
| threads concurrently. Thread-safe code uses appropriate | |
| :term:`synchronization primitives <synchronization primitive>` like | |
| :term:`locks <lock>` to protect shared mutable state, or is designed | |
| to avoid shared mutable state entirely. Code that is not thread-safe may experience | |
| :term:`race conditions <race condition>` and :term:`data races <data race>` | |
| when used in multi-threaded programs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, talking about thread-safety is always bound to be vague. Having said that, the sentence about dict, list and set feels like it's okay since it doesn't necessarily make any statement, but makes the reader understand that there's subtlety there which they can found out more about in other places in the docs.
| the :term:`cyclic garbage collector <garbage collection>` is to identify these groups and break the reference | ||
| cycles so that the memory can be reclaimed. | ||
|
|
||
| data race |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both data race and race condition feels a bit redundant. Wouldn't ecosystem libraries be able to only refer to race conditions?
Am I guessing correctly that here you want to distinguish between races that produce errors in application logic and races that produce a segfault or something low-level like that? If that's the case it might be useful when talking about bug reports, yes. But it might be a bit niche, hopefully?
Doc/glossary.rst
Outdated
| An :term:`object` with state that is allowed to change during the course | ||
| of the program. In multi-threaded programs, mutable objects that are | ||
| shared between threads require careful synchronization to avoid | ||
| :term:`concurrent modification` issues. See also :term:`immutable`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it feels more correct to imply race conditions here.
| :term:`concurrent modification` issues. See also :term:`immutable`, | |
| :term:`race conditions`. See also :term:`immutable`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the previous definition was mentioning id(). Why not keep that? It might help newcomers make the connection and learn that they can check the id(obj) to know whether obj is shared. (Especially useful in debugging sessions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is something we discussed with Nathan and we both agreed that using id to define mutable will confuse more people than it'll help. The advice about using id is a good one though, but maybe it best-positioned somewhere else?
Co-authored-by: Daniele Parmeggiani <8658291+dpdani@users.noreply.github.com>
📚 Documentation preview 📚: https://cpython-previews--140375.org.readthedocs.build/