NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.

Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).

Arithmetic operators such as * don't work with arrays such as ar1 which have a string_ datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the * operator doesn't make sense here. However, the line

np.array(['avinash','jay'], dtype=object) * 2

works because now the array is an array of (pointers to) Python strings. The * operator is well defined for these Python string objects. New Python strings are created in memory and a new object array with references to the new strings is returned.


If you have an array with string_ or unicode_ dtype and want to repeat each string, you can use np.char.multiply:

In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'], 
      dtype='<U14')

NumPy has many other vectorised string methods too.

Answer from Alex Riley on Stack Overflow
🌐
NumPy
numpy.org › doc › stable › reference › arrays.dtypes.html
Data type objects (dtype) — NumPy v2.4 Manual
A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.
Top answer
1 of 2
64

NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.

Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).

Arithmetic operators such as * don't work with arrays such as ar1 which have a string_ datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the * operator doesn't make sense here. However, the line

np.array(['avinash','jay'], dtype=object) * 2

works because now the array is an array of (pointers to) Python strings. The * operator is well defined for these Python string objects. New Python strings are created in memory and a new object array with references to the new strings is returned.


If you have an array with string_ or unicode_ dtype and want to repeat each string, you can use np.char.multiply:

In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'], 
      dtype='<U14')

NumPy has many other vectorised string methods too.

2 of 2
4

There are 3 main dtypes to store strings in numpy:

  • object: Stores pointers to Python objects
  • str: Stores fixed-width strings
  • numpy.types.StringDType(): New in numpy 2.0 and stores variable-width strings

str consumes more memory than object; StringDType is better

Depending on the length of the fixed-length string and the size of the array, the ratio differs but as long as the longest string in the array is longer than 2 characters, str consumes more memory (they are equal when the longest string in the array is 2 characters long). For example, in the following example, str consumes almost 8 times more memory.

On the other hand, the new (in numpy>=2.0) numpy.dtypes.StringDType stores variable width strings, so consumes much less memory.

from pympler.asizeof import asizeof

ar1 = np.array(['this is a string', 'string']*1000, dtype=object)
ar2 = np.array(['this is a string', 'string']*1000, dtype=str)
ar3 = np.array(['this is a string', 'string']*1000, dtype=np.dtypes.StringDType())

asizeof(ar2) / asizeof(ar1)  # 7.944444444444445
asizeof(ar3) / asizeof(ar1)  # 1.992063492063492

For numpy 1.x, str is slower than object

For numpy>=2.0.0, str is faster than object

Numpy 2.0 has introduced a new numpy.strings API that has much more performant ufuncs for string operations. A simple test (on numpy 2.2.0) below shows that vectorized string operations on an array of str or StringDType dtype is much faster than the same operations on an object dtype array.

import timeit

t1 = min(timeit.repeat(lambda: ar1*2, number=1000))
t2a = min(timeit.repeat(lambda: np.strings.multiply(ar2, 2), number=1000))
t2b = min(timeit.repeat(lambda: np.strings.multiply(ar3, 2), number=1000))
print(t2a / t1)   # 0.8786601958427778
print(t2b / t1)   # 0.7311586933668037

t3 = min(timeit.repeat(lambda: np.array([s.count('i') for s in ar1]), number=1000))
t4a = min(timeit.repeat(lambda: np.strings.count(ar2, 'i'), number=1000))
t4b = min(timeit.repeat(lambda: np.strings.count(ar3, 'i'), number=1000))

print(t4a / t3)   # 0.13328748153237377
print(t4b / t3)   # 0.3365874412749679
For numpy<2.0.0 (tested on numpy 1.26.0)

Numpy 1.x's vectorized string methods are not optimized, so operating on the object array is often faster. For example, in the example in the OP where each character is repeated, a simple * (aka multiply()) is not only more concise but also over 10 times faster than char.multiply().

import timeit
setup = "import numpy as np; from __main__ import ar1, ar2"
t1 = min(timeit.repeat("ar1*2", setup, number=1000))
t2 = min(timeit.repeat("np.char.multiply(ar2, 2)", setup, number=1000))
t2 / t1   # 10.650433758517027

Even for functions that cannot be readily be applied on the array, instead of the vectorized char method of str arrays, it is faster to loop over the object array and work on the Python strings.

For example, iterating over the object array and calling str.count() on each Python string is over 3 times faster than the vectorized char.count() on the str array.

f1 = lambda: np.array([s.count('i') for s in ar1])
f2 = lambda: np.char.count(ar2, 'i')

setup = "import numpy as np; from __main__ import ar1, ar2, f1, f2, f3"
t3 = min(timeit.repeat("f1()", setup, number=1000))
t4 = min(timeit.repeat("f2()", setup, number=1000))

t4 / t3   # 3.251369161574832

On a side note, if it comes to explicit loop, iterating over a list is faster than iterating over a numpy array. So in the previous example, a further performance gain can be made by iterating over the list

f3 = lambda: np.array([s.count('i') for s in ar1.tolist()])
#                                               ^^^^^^^^^  <--- convert to list here
t5 = min(timeit.repeat("f3()", setup, number=1000))
t3 / t5   # 1.2623498005294627
Discussions

Numpy Array Data Type Objects (dtype) - Programming Diary - CADET
Here’s another fun little example regarding numpy indices. import numpy as np array = np.array([1, 2]) print(array) [1 2] Now, I’m trying to assign a new value to the first item of the array: array[0] = 1.5 print(array) [1 2] This is rather unexpected. I would have expected the following: ... More on forum.cadet-web.de
🌐 forum.cadet-web.de
3
August 7, 2023
Expected dtype object, got 'numpy.dtype[float64]'
MNE-Python version: 0.23 operating ... few days. But it did not work today when I try to run the same code again. it rased the error"expected dtype object, got ‘numpy.dt...... More on mne.discourse.group
🌐 mne.discourse.group
1
0
May 18, 2021
DataFrame construction from numpy with dtype object
Working in numpy with `np.object_ is indispensable when other columns are strings or nested arrays, or to set "nulls" with NaN for integer columns. However, since polars dtype is columnar, it guess it should support to concretize it. More on github.com
🌐 github.com
4
July 23, 2024
AttributeError: 'numpy.dtype' object has no attribute 'base_dtype' - Development - PyMC Discourse
Dear all, just a very small question: The following code gives an arror AttributeError: 'numpy.dtype' object has no attribute 'base_dtype', but because I am new in TF, I am not sure about the mistake. Maybe you could po… More on discourse.pymc.io
🌐 discourse.pymc.io
0
April 14, 2020
🌐
NumPy
numpy.org › devdocs › reference › generated › numpy.dtype.html
numpy.dtype — NumPy v2.5.dev0 Manual
Returns dtype for the base element of the subarrays, regardless of their dimension or shape. ... A character indicating the byte-order of this data-type object.
🌐
GeeksforGeeks
geeksforgeeks.org › python › data-type-object-dtype-numpy-python
Data type Object (dtype) in NumPy Python - GeeksforGeeks
January 19, 2026 - In NumPy, dtype defines the type of data stored in an array and how much memory each value uses. It controls how raw memory bytes are interpreted, making NumPy operations fast and efficient.
🌐
W3Schools
w3schools.com › python › numpy › numpy_data_types.asp
NumPy Data Types
The NumPy array object has a property called dtype that returns the data type of the array: Get the data type of an array object: import numpy as np arr = np.array([1, 2, 3, 4]) print(arr.dtype) Try it Yourself » · Get the data type of an ...
🌐
Python Course
python-course.eu › numerical-programming › numpy-data-objects-dtype.php
3. Numpy Data Objects, dtype | Numerical Programming
In this chapter, we explore how NumPy uses dtype to manage memory, how different data types behave, how to inspect and convert them, and how custom data types can be created for advanced use cases. The data type object 'dtype' is an instance of numpy.dtype class.
🌐
NumPy
numpy.org › doc › stable › user › basics.types.html
Data types — NumPy v2.4 Manual
NumPy numerical types are instances of numpy.dtype (data-type) objects, each having unique characteristics. Once you have imported NumPy using import numpy as np you can create arrays with a specified dtype using the scalar types in the numpy top-level API, e.g.
Find elsewhere
🌐
CADET
forum.cadet-web.de › programming diary
Numpy Array Data Type Objects (dtype) - Programming Diary - CADET
August 7, 2023 - Here’s another fun little example regarding numpy indices. import numpy as np array = np.array([1, 2]) print(array) [1 2] Now, I’m trying to assign a new value to the first item of the array: array[0] = 1.5 print(array) [1 2] This is rather unexpected. I would have expected the following: [1.5 2] The issue here is that numpy automatically detects the data type that describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.
🌐
Codegive
codegive.com › blog › numpy_array_dtype_object.php
Numpy array dtype object
This tutorial will dive deep into ... it. When a NumPy array has dtype=object, it means that each element in the array is a reference (or pointer) to an arbitrary Python object, rather than storing the actual value itself....
🌐
NumPy
numpy.org › doc › 2.1 › reference › generated › numpy.dtype.name.html
numpy.dtype.name — NumPy v2.1 Manual
dtype.name# A bit-width name for this data-type. Un-sized flexible data-type objects do not have this attribute. Examples · >>> import numpy as np >>> x = np.dtype(float) >>> x.name 'float64' >>> x = np.dtype([('a', np.int32, 8), ('b', np.float64, 6)]) >>> x.name 'void640' On this page
🌐
Quansight
quansight.com › home › post › my numpy year: creating a dtype for the next generation of scientific computing
My NumPy Year: Creating a DType for the Next Generation of Scientific Computing | Quansight Consulting
October 30, 2024 - Another thing people have done, ... use object arrays. You can create an array in NumPy with dtype=object and it stores the Python strings and Python objects that you put into the array directly....
🌐
Modular
docs.modular.com › max › api › python › dtype
max.dtype | Modular
Skip to main content · DocsCode · Nightly · Work in progress · v26.2 · Mar 19, 2026/ Stable release · Search · Collapse sidebar · Python module
🌐
GitHub
github.com › pola-rs › polars › issues › 17819
DataFrame construction from numpy with dtype object · Issue #17819 · pola-rs/polars
July 23, 2024 - Working in numpy with `np.object_ is indispensable when other columns are strings or nested arrays, or to set "nulls" with NaN for integer columns. However, since polars dtype is columnar, it guess it should support to concretize it.
Author   dpinol
🌐
Scaler
scaler.com › home › topics › numpy › creating a numpy datatype
Creating a NumPy DataType - Scaler Topics
November 9, 2022 - There are two ways of creating ... constructor, like character codes, tuples, datatypes, etc. When we use dtype as a constructor, it returns a datatype object......
🌐
Note.nkmk.me
note.nkmk.me › home › python › numpy
NumPy: astype() to change dtype of an array | note.nkmk.me
February 4, 2024 - NumPy arrays (ndarray) hold a data type (dtype). You can set this through various operations, such as when creating an ndarray with np.array(), or change it later with astype(). Data type objects (dty ...
🌐
Python Data Science Handbook
jakevdp.github.io › PythonDataScienceHandbook › 02.09-structured-data-numpy.html
Structured Data: NumPy's Structured Arrays | Python Data Science Handbook
NumPy can handle this through structured arrays, which are arrays with compound data types. Recall that previously we created a simple array using an expression like this: ... # Use a compound data type for structured arrays data = np.zeros(4, dtype={'names':('name', 'age', 'weight'), 'formats':('U10', 'i4', 'f8')}) print(data.dtype)
🌐
GeeksforGeeks
geeksforgeeks.org › numpy › python-dtype-object-length-of-numpy-array-of-strings
Python | dtype object length of Numpy array of strings - GeeksforGeeks
March 14, 2019 - In numpy, if the underlying data type of the given object is string then the dtype of object is the length of the longest string in the array. This is so because we cannot create variable length string in numpy since numpy needs to know how ...
🌐
PyMC Discourse
discourse.pymc.io › development
AttributeError: 'numpy.dtype' object has no attribute 'base_dtype' - Development - PyMC Discourse
April 14, 2020 - Dear all, just a very small question: The following code gives an arror AttributeError: 'numpy.dtype' object has no attribute 'base_dtype', but because I am new in TF, I am not sure about the mistake. Maybe you could point it out fast. t = np.array(np.arange(1,13,1),dtype='int64') @pm.model def lt_model(): Δ = yield pm.Laplace(loc=0, scale=4, batch_stack=5, name='Δ') g = tf.math.scalar_mul(Δ,t) trace = pm.inference.sampling.sample( lt_model(), num_chains=2, num_samples=10,...