-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Description
Describe the issue:
When using the new experimental StringDType with numpy.strings.center or numpy.char.center, if the width parameter is negative, it raises a MemoryError.
However, the same negative width works fine with Python str objects, numpy.str_, and np.dtype('<U') string arrays — they return the input unchanged, which matches Python’s str.center behavior.
Reproduce the code example:
import numpy as np
from numpy.dtypes import StringDType
a1 = 'test'
a2 = np.array('test', dtype = np.str_)
a3 = np.array('test', dtype = '<U4')
a4 = np.array('test', dtype=StringDType())
res1 = np.strings.center(a1, -1)
print(res1) # test
res2 = np.strings.center(a2, -1)
print(res2) # test
res3 = np.strings.center(a3, -1)
print(res3) # test
res4 = np.strings.center(a4, -1)
print(res4) # MemoryError: Failed to allocate string in _centerError message:
Traceback (most recent call last):
File "xxx/test_string.py", line 45, in <module>
res4 = numpy.strings.center(a4, -1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xxx/python3.12/site-packages/numpy/_core/strings.py", line 765, in center
return _center(a, width, fillchar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in _centerPython and NumPy Versions:
Numpy: 2.3.1
Python: 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 17:29:18) [GCC 11.2.0]
Runtime Environment:
[{'numpy_version': '2.3.1',
'python': '3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, '
'17:29:18) [GCC 11.2.0]',
'uname': uname_result(system='Linux', node='gpu-node5', release='5.4.0-100-generic', version='#113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'SkylakeX',
'filepath': '/home/root/miniconda3/envs/pbt/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-56d6093b.so',
'internal_api': 'openblas',
'num_threads': 64,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.29'}]
Context for the issue:
I found that this issue is not limited to center, but also occurs in other string/char functions such as ljust, rjust, and zfill when used with StringDType arrays and negative width values. This appears to be a broader issue with StringDType handling in these functions.
Although the documentation states that StringDType is supported, inconsistent or erroneous behavior (e.g., MemoryError) suggests there may be an underlying bug. Should we take any action to address this inconsistency and improve robustness for users?