-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
MAINT: replace use of asanyarray with out=... to keep arrays
#29951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4f7fcd0 to
3e27ff0
Compare
asanyarray with out=... to keep arrays
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments to help review.
| x = um.subtract(arr, arrmean, out=...) | ||
| if issubclass(arr.dtype.type, (nt.floating, nt.integer)): | ||
| x = um.multiply(x, x, out=x) | ||
| x = um.square(x, out=x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, couldn't resist making a few more small improvements as I was here...
| q = np.quantile(x, .5) | ||
| assert_equal(q, 1.75) | ||
| assert_equal(type(q), np.float64) | ||
| assert isinstance(q, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a side effect, the function now returns regular float. Since here one is dealing with object arrays, that arguably is more appropriate. But it really should not matter at all -- it is not like one could count on an np.float64, in most cases one gets a Fraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I had skipped some cases like this when first adding out=..., but should have followed up on it. I think this is fine for object array input... It is arguably a bug-fix.
But it is also a very subtle small change that might break someone in theory!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really only in theory, I think, since even before one couldn't count on getting np.float64...
|
|
||
| # Convention is to return scalars instead of 0d arrays | ||
| if r.ndim == 0: | ||
| r = r[()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change below takes care of this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yeah with r potentially being scalar might be awkward, but I don't think this makes it less awkward and we use dtype.type(val) in enough places that I think we can rely on it.
(It seemed slightly awkward for object dtype, but it works even there.)
3e27ff0 to
86c9d84
Compare
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. The change in quantile for object input is very subtle, but I think we should do it and it's technically fixing bugs for decimal or fractional inputs.
I'll wait a bit in case someone has thoughts, but I think we can put this in.
| q = np.quantile(x, .5) | ||
| assert_equal(q, 1.75) | ||
| assert_equal(type(q), np.float64) | ||
| assert isinstance(q, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I had skipped some cases like this when first adding out=..., but should have followed up on it. I think this is fine for object array input... It is arguably a bug-fix.
But it is also a very subtle small change that might break someone in theory!
| diff_b_a = subtract(b, a) | ||
| # asanyarray is a stop-gap until gh-13105 | ||
| lerp_interpolation = asanyarray(add(a, diff_b_a * t, out=out)) | ||
| diff_b_a = b - a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to do subtract(b, a, out=...) actually and stick to arrays?
(but doesn't matter, TBH.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My logic was that if both are scalars, this is faster. I had hoped more generally to go to just using the operators, but in other places it turned out annoying, and then I decided to keep the changes minimal... Anyway, would suggest to leave it, but can change back if desired.
|
|
||
| # Convention is to return scalars instead of 0d arrays | ||
| if r.ndim == 0: | ||
| r = r[()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yeah with r potentially being scalar might be awkward, but I don't think this makes it less awkward and we use dtype.type(val) in enough places that I think we can rely on it.
(It seemed slightly awkward for object dtype, but it works even there.)
|
Let's get this in then, thanks Marten! The biggest change (I am not sure how big), is that quantiles propagate Fractions better with this which might change things in some niche situation (arguably a fix, but possibly a subtle change). |
|
FYI it seems that, as a side effect, this fixed some odd behavior we had in unyt I don't understand it myself without a deeper dig, but I thought that'd be interesting to you guys ! |
|
@neutrinoceros - nice to hear. Agreed it is not that obvious what change actually made the code work better. |
| """ | ||
| # promote back to an array if flattened | ||
| res = nx.asanyarray(nx.ceil(x, out=out)) | ||
| res = nx.ceil(x, out=... if out is None else out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is causing a failure in pandas
https://github.com/pandas-dev/pandas/actions/runs/19195436099/job/54875728980?pr=63046#step:5:63
Not clear to me this was intentional. This is complicated by the fact that this was already xfailed on the main branch due to pandas-dev/pandas#51082 (comment), see also the comment a few below this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intentional, but we should allow out=... in pandas' __array_ufunc__ implementation maybe (it is the same as None except for reductions).
This allows fixing weird corner cases with object dtype in NumPy, so it would be nice to not stall on it forever.
(I don't mind reverting this or similar cases for now if you prefer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm. We should just change fix to forward to trunc, that should be safe and good right now and not worry about this regression.
I'll have a look at the __array_ufunc__ in pandas so that this type of change doesn't break anything in the future.
With numpygh-28576, it has become possible to ensure that when calling a ufunc, the output is guaranteed to be an array. This PR uses that to replace np.asanyarray calls in some functions. I'm not sure this is complete -- these were just parts of code for which test failed if np.anyarray(scalar) is made to return a read-only array (see numpygh-29876). I do think there are all small improvements, though.
With numpygh-28576, it has become possible to ensure that when calling a ufunc, the output is guaranteed to be an array. This PR uses that to replace np.asanyarray calls in some functions. I'm not sure this is complete -- these were just parts of code for which test failed if np.anyarray(scalar) is made to return a read-only array (see numpygh-29876). I do think there are all small improvements, though.
With #28576, it has become possible to ensure that when calling a ufunc, the output is guaranteed to be an array.
This PR uses that to replace
np.asanyarraycalls in some functions. I'm not sure this is complete -- these were just parts of code for which test failed ifnp.anyarray(scalar)is made to return a read-only array (see #29876).I do think there are all small improvements, though.