Brave Search

What is the default dtype for str like input in numpy?

stackoverflow.com › questions › 46051977 › what-is-the-default-dtype-for-str-like-input-in-numpy

b'...' means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. Unicodes (python 3 strings are unicode) are U and Python 2 str or Python 3 bytes have the dtype S. You can find the explanation of dtypes in the NumPy documentation here

Array-protocol type strings

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are:

'?' boolean

'b' (signed) byte

'B' unsigned byte

'i' (signed) integer

'u' unsigned integer

'f' floating-point

'c' complex-floating point

'm' timedelta

'M' datetime

'O' (Python) objects

'S', 'a' zero-terminated bytes (not recommended)

'U' Unicode string

'V' raw data (void)

However in your first case you actually forced NumPy to convert it to bytes because you specified dtype='S'.

Answer from MSeifert on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 46051977 › what-is-the-default-dtype-for-str-like-input-in-numpy

python - What is the default dtype for str like input in numpy? - Stack Overflow

Videos

12:08

YouTube

What Pandas users should know about NumPy 2.0 and dtypes - YouTube

June 18, 2024

08:42

YouTube

Learn NumPy data types in 8 minutes! 💱 - YouTube

November 21, 2025

05:44

YouTube

NumPy Data Types (dtype) Explained - Complete Guide for Beginners ...

October 28, 2025

youtube.com

NumPy do Zero: Arrays, Dtypes e Performance na Prática ...

youtube.com

6. DataTypes in NumPy | Complete Python NumPy Tutorial for ...

05:34

YouTube

NumPy Data Types - YouTube

February 1, 2026

View all

NumPy

numpy.org › devdocs › user › basics.strings.html

Working with Arrays of Strings And Bytes — NumPy v2.5.dev0 Manual

To support situations like this, NumPy provides numpy.dtypes.StringDType, which stores variable-width string data in a UTF-8 encoding in a NumPy array:

NumPy

numpy.org › doc › stable › reference › arrays.dtypes.html

Data type objects (dtype) — NumPy v2.4 Manual

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

NumPy

numpy.org › doc › stable › reference › routines.dtypes.html

Data type classes (numpy.dtypes) — NumPy v2.4 Manual

numpy.dtypes.StrDType[source]# numpy.dtypes.BytesDType# numpy.dtypes.StringDType# numpy.dtypes.DateTime64DType[source]# numpy.dtypes.TimeDelta64DType# numpy.dtypes.ObjectDType[source]# numpy.dtypes.VoidDType# On this page

NumPy

numpy.org › doc › stable › user › basics.types.html

Data types — NumPy v2.4 Manual

In addition to numerical types, NumPy also supports storing unicode strings, via the numpy.str_ dtype (U character code), null-terminated byte sequences via numpy.bytes_ (S character code), and arbitrary byte sequences, via numpy.void (V character code).

NumPy

numpy.org › doc › 2.2 › reference › routines.dtypes.html

Data type classes (numpy.dtypes) — NumPy v2.2 Manual

numpy.dtypes.StrDType[source]# numpy.dtypes.BytesDType# numpy.dtypes.StringDType# numpy.dtypes.DateTime64DType[source]# numpy.dtypes.TimeDelta64DType# numpy.dtypes.ObjectDType[source]# numpy.dtypes.VoidDType# On this page

NumPy

numpy.org › doc › 2.1 › reference › routines.dtypes.html

Data type classes (numpy.dtypes) — NumPy v2.1 Manual

numpy.dtypes.StrDType[source]# numpy.dtypes.BytesDType# numpy.dtypes.StringDType# numpy.dtypes.DateTime64DType[source]# numpy.dtypes.TimeDelta64DType# numpy.dtypes.ObjectDType[source]# numpy.dtypes.VoidDType# On this page

Find elsewhere

Google Bing Mojeek

W3Schools

w3schools.com › python › numpy › numpy_data_types.asp

NumPy Data Types

We use the array() function to create arrays, this function can take an optional argument: dtype that allows us to define the expected data type of the array elements: ... import numpy as np arr = np.array([1, 2, 3, 4], dtype='S') print(arr) print(arr.dtype) Try it Yourself »

NumPy

numpy.org › doc › 2.3 › reference › routines.dtypes.html

Data type classes (numpy.dtypes) — NumPy v2.3 Manual

numpy.dtypes.StrDType[source]# numpy.dtypes.BytesDType# numpy.dtypes.StringDType# numpy.dtypes.DateTime64DType[source]# numpy.dtypes.TimeDelta64DType# numpy.dtypes.ObjectDType[source]# numpy.dtypes.VoidDType# On this page

NumPy

numpy.org › devdocs › user › basics.types.html

Data types — NumPy v2.5.dev0 Manual

In addition to numerical types, NumPy also supports storing unicode strings, via the numpy.str_ dtype (U character code), null-terminated byte sequences via numpy.bytes_ (S character code), and arbitrary byte sequences, via numpy.void (V character code).

NumPy

numpy.org › doc › stable › reference › generated › numpy.dtype.str.html

numpy.dtype.str — NumPy v2.3 Manual

dtype.str# The array-protocol typestring of this data-type object.

NumPy

numpy.org › devdocs › reference › generated › numpy.dtype.html

numpy.dtype — NumPy v2.5.dev0 Manual

A dtype object can be constructed from different combinations of fundamental numeric types.

Stack Overflow

stackoverflow.com › questions › 29877508 › what-does-dtype-object-mean-while-creating-a-numpy-array

python - What does dtype=object mean while creating a numpy array? - Stack Overflow

`str` consumes more memory than `object`; `StringDType` is better

Depending on the length of the fixed-length string and the size of the array, the ratio differs but as long as the longest string in the array is longer than 2 characters, str consumes more memory (they are equal when the longest string in the array is 2 characters long). For example, in the following example, str consumes almost 8 times more memory.

On the other hand, the new (in numpy>=2.0) numpy.dtypes.StringDType stores variable width strings, so consumes much less memory.

from pympler.asizeof import asizeof

ar1 = np.array(['this is a string', 'string']*1000, dtype=object)
ar2 = np.array(['this is a string', 'string']*1000, dtype=str)
ar3 = np.array(['this is a string', 'string']*1000, dtype=np.dtypes.StringDType())

asizeof(ar2) / asizeof(ar1)  # 7.944444444444445
asizeof(ar3) / asizeof(ar1)  # 1.992063492063492

For numpy 1.x, `str` is slower than `object`

For numpy>=2.0.0, `str` is faster than `object`

Numpy 2.0 has introduced a new numpy.strings API that has much more performant ufuncs for string operations. A simple test (on numpy 2.2.0) below shows that vectorized string operations on an array of str or StringDType dtype is much faster than the same operations on an object dtype array.

import timeit

t1 = min(timeit.repeat(lambda: ar1*2, number=1000))
t2a = min(timeit.repeat(lambda: np.strings.multiply(ar2, 2), number=1000))
t2b = min(timeit.repeat(lambda: np.strings.multiply(ar3, 2), number=1000))
print(t2a / t1)   # 0.8786601958427778
print(t2b / t1)   # 0.7311586933668037

t3 = min(timeit.repeat(lambda: np.array([s.count('i') for s in ar1]), number=1000))
t4a = min(timeit.repeat(lambda: np.strings.count(ar2, 'i'), number=1000))
t4b = min(timeit.repeat(lambda: np.strings.count(ar3, 'i'), number=1000))

print(t4a / t3)   # 0.13328748153237377
print(t4b / t3)   # 0.3365874412749679

For numpy<2.0.0 (tested on numpy 1.26.0)

Numpy 1.x's vectorized string methods are not optimized, so operating on the object array is often faster. For example, in the example in the OP where each character is repeated, a simple * (aka multiply()) is not only more concise but also over 10 times faster than char.multiply().

import timeit
setup = "import numpy as np; from __main__ import ar1, ar2"
t1 = min(timeit.repeat("ar1*2", setup, number=1000))
t2 = min(timeit.repeat("np.char.multiply(ar2, 2)", setup, number=1000))
t2 / t1   # 10.650433758517027

Even for functions that cannot be readily be applied on the array, instead of the vectorized char method of str arrays, it is faster to loop over the object array and work on the Python strings.

For example, iterating over the object array and calling str.count() on each Python string is over 3 times faster than the vectorized char.count() on the str array.

f1 = lambda: np.array([s.count('i') for s in ar1])
f2 = lambda: np.char.count(ar2, 'i')

setup = "import numpy as np; from __main__ import ar1, ar2, f1, f2, f3"
t3 = min(timeit.repeat("f1()", setup, number=1000))
t4 = min(timeit.repeat("f2()", setup, number=1000))

t4 / t3   # 3.251369161574832

On a side note, if it comes to explicit loop, iterating over a list is faster than iterating over a numpy array. So in the previous example, a further performance gain can be made by iterating over the list

f3 = lambda: np.array([s.count('i') for s in ar1.tolist()])
#                                               ^^^^^^^^^  <--- convert to list here
t5 = min(timeit.repeat("f3()", setup, number=1000))
t3 / t5   # 1.2623498005294627

NumPy

numpy.org › doc › 2.1 › user › basics.types.html

Data types — NumPy v2.1 Manual

In addition to numerical types, NumPy also supports storing unicode strings, via the numpy.str_ dtype (U character code), null-terminated byte sequences via numpy.bytes_ (S character code), and arbitrary byte sequences, via numpy.void (V character code).

NumPy

numpy.org › doc › 2.1 › reference › arrays.dtypes.html

Data type objects (dtype) — NumPy v2.1 Manual

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

NumPy

numpy.org › doc › 2.1 › reference › generated › numpy.dtype.str.html

numpy.dtype.str — NumPy v2.1 Manual

dtype.str# The array-protocol typestring of this data-type object.

GitHub

github.com › pandas-dev › pandas › issues › 47884

Pandas string dtype needs from NumPy - prototyping & plan of attack · Issue #47884 · pandas-dev/pandas

July 28, 2022 - The purpose of this issue is to discuss a plan of attack for improving string dtypes in NumPy to better suit Pandas. Context @seberg has spent a ton of effort improving the infrastructure NumPy offers for implement dtypes, in NumPy itsel...

Author rgommers

Stack Overflow

stackoverflow.com › questions › 30086936 › what-is-the-difference-between-the-types-type-numpy-string-and-type-str

python - What is the difference between the types <type 'numpy.string_'> and <type 'str'>? - Stack Overflow

Array-protocol type strings

Array-protocol type strings

Videos

str consumes more memory than object; StringDType is better

For numpy 1.x, str is slower than object

For numpy>=2.0.0, str is faster than object

For numpy<2.0.0 (tested on numpy 1.26.0)

`str` consumes more memory than `object`; `StringDType` is better

For numpy 1.x, `str` is slower than `object`

For numpy>=2.0.0, `str` is faster than `object`