-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Description
Since a very long time (predating clang, I believe), NumPy uses NPY_GCC_OPT_3 in a few places. This macro is useful to locally enable a high optimization level for functions we know should be optimized: tight, simple (usually 1-D) loops.
However, due to its age, the macro only applies to GCC (unless clang picks it up?). It would be nice to generalize the macro a bit to other compilers, probably using #pragma depending on the compilers.
This may need some care, since different compilers have different ideas of what O3 means, IIRC. So it may be that e.g. the Intel compiler enables unsafe fast-math when GCC does not.
The task are:
- Check how various compilers change the optimization level for a single function
- Add additional branches for those compilers to the
#define(maybe renaming it) - Check benchmarks of functions that should modified it with the compilers in question
- Run the test suite
- Double check the compiler documentation to be sure that no unsafe fast-math is enabled. We cannot trust our test-suite on all accounts (e.g. floating point error flags).
In some cases functions that currently use this, may end up as universal-intrinsics eventually. But I somewhat expect that this macro will stay useful for things where maximum performance is less important or just as a stop gap, because it adds no complexity.
EDIT: I expect this is a fairly nice sprintable thing to investigate, although it is best if an MSVC setup is available. (Clang likely supports the gcc attributes, but I am not sure. This is a useful reference probably: https://stackoverflow.com/questions/31373885/how-to-change-optimization-level-of-one-function.)