🌐
W3Schools
w3schools.com › python › numpy › numpy_data_types.asp
NumPy Data Types
Create an array with data type string: import numpy as np arr = np.array([1, 2, 3, 4], dtype='S') print(arr) print(arr.dtype) Try it Yourself » · For i, u, f, S and U we can define size as well. Create an array with data type 4 bytes integer: import numpy as np arr = np.array([1, 2, 3, 4], dtype='i4') print(arr) print(arr.dtype) Try it Yourself » · If a type is given in which elements can't be casted then NumPy will raise a ValueError. ValueError: In Python ValueError is raised when the type of passed argument to a function is unexpected/incorrect.
🌐
W3Schools
w3schools.com › python › python_datatypes.asp
Python Data Types
Python Overview Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods Python File Methods Python Keywords Python Exceptions Python Glossary
🌐
NumPy
numpy.org › doc › stable › reference › arrays.dtypes.html
Data type objects (dtype) — NumPy v2.4 Manual
>>> dt = np.dtype(float) # Python-compatible floating-point number >>> dt = np.dtype(int) # Python-compatible integer >>> dt = np.dtype(object) # Python object ... All other types map to object_ for convenience. Code should expect that such types may map to a specific (new) dtype in the future. ... Any type object with a dtype attribute: The attribute will be accessed and used directly. The attribute must return something that is convertible into a dtype object. Several kinds of strings can be converted.
🌐
W3Schools
w3schools.com › python › pandas › ref_df_select_dtypes.asp
Pandas DataFrame select_dtypes() Method
The select_dtypes() method returns a new DataFrame that includes/excludes columns of the specified dtype(s).
🌐
W3Schools
w3schools.com › python › pandas › ref_df_convert_dtypes.asp
Pandas DataFrame convert_dtypes() Method
The convert_dtypes() method returns a new DataFrame where each column has been changed to the best possible data type.
🌐
W3Schools
w3schools.com › python › exercise.asp
W3Schools Python Exercise
More Python Exercises · Close menu · Go to w3schools.com · Sign in to track your progress · × · Sign in · × · Get Started3 q · Syntax3 q · Comments3 q · Variables4 q · Variable Names3 q · Multiple Variable Values3 q · Output Variable3 q · Global Variable3 q · Data Types8 q · Numbers4 q · Casting3 q · Strings4 q ·
🌐
Practical Business Python
pbpython.com › pandas_dtypes.html
Overview of Pandas Data Types - Practical Business Python
Customer Number int64 Customer Name object 2016 float64 2017 float64 Percent Growth object Jan Units object Month int64 Day int64 Year int64 Active object dtype: object · For another example of using lambda vs. a function, we can look at the process for fixing the Percent Growth column. ... def convert_percent(val): """ Convert the percentage string to an actual floating point percent - Remove % - Divide by 100 to make decimal """ new_val = val.replace('%', '') return float(new_val) / 100 df['Percent Growth'].apply(convert_percent)
Top answer
1 of 2
10

b'...' means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. Unicodes (python 3 strings are unicode) are U and Python 2 str or Python 3 bytes have the dtype S. You can find the explanation of dtypes in the NumPy documentation here

Array-protocol type strings

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are:

  • '?' boolean
  • 'b' (signed) byte
  • 'B' unsigned byte
  • 'i' (signed) integer
  • 'u' unsigned integer
  • 'f' floating-point
  • 'c' complex-floating point
  • 'm' timedelta
  • 'M' datetime
  • 'O' (Python) objects
  • 'S', 'a' zero-terminated bytes (not recommended)
  • 'U' Unicode string
  • 'V' raw data (void)

However in your first case you actually forced NumPy to convert it to bytes because you specified dtype='S'.

2 of 2
2

Since NumPy version 2.0 a new numpy.dtypes.StringDType is available.

Often, real-world string data does not have a predictable length. In these cases it is awkward to use fixed-width strings, since storing all the data without truncation requires knowing the length of the longest string one would like to store in the array before the array is created.

To support situations like this, NumPy provides numpy.dtypes.StringDType, which stores variable-width string data in a UTF-8 encoding in a NumPy array:

from numpy.dtypes import StringDType
data = ["this is a longer string", "short string"]
arr = np.array(data, dtype=StringDType())
arr
array(['this is a longer string', 'short string'], dtype=StringDType())

Note that unlike fixed-width strings, StringDType is not parameterized by the maximum length of an array element, arbitrarily long or short strings can live in the same array without needing to reserve storage for padding bytes in the short strings.

🌐
W3Schools
w3schools.com › python › pandas › ref_df_dtypes.asp
Pandas DataFrame dtypes Property
The dtypes property returns data type of each column in the DataFrame.
🌐
W3Schools
w3schools.com › python › pandas › ref_df_astype.asp
Pandas DataFrame astype() Method
You can cast the entire DataFrame to one specific data type, or you can use a Python Dictionary to specify a data type for each column, like this: { 'Duration': 'int64', 'Pulse' : 'float', 'Calories': 'int64' } ... The copy and errors parameters are keyword arguments. a Pandas DataFrame with the changes set according to the specified dtype(s).
Find elsewhere
🌐
W3Schools
w3schools.com › python › python_strings_methods.asp
Python - String Methods
Python has a set of built-in methods that you can use on strings. Note: All string methods return new values. They do not change the original string. ... If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: sales@w3schools.com
🌐
GeeksforGeeks
geeksforgeeks.org › data-type-object-dtype-numpy-python
Data type Object (dtype) in NumPy Python - GeeksforGeeks
August 11, 2021 - In the case of structured arrays, the dtype object will also be structured. ... # Python program for demonstrating # the use of fields import numpy as np # A structured data type containing a 16-character string (in field ‘name’) # and a sub-array of two 64-bit floating-point number (in field ‘grades’): dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))]) # Data type of object with field grades print(dt['grades']) # Data type of object with field name print(dt['name'])
🌐
Park
park.is › notebooks › comparing-pandas-string-dtypes
An In-depth Comparison of Pandas String dtypes
May 27, 2023 - Checking the data types displays string for both columns. ... Print columns as Python lists to check whether each value has single qutoes around it. ... <class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1 non-null string 1 B 3 non-null string dtypes: string(2) memory usage: 181.0 bytes
Top answer
1 of 4
208

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in an ndarray must have the same size in bytes. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of saving the bytes of strings in the ndarray directly, Pandas uses an object ndarray, which saves pointers to objects; because of this the dtype of this kind ndarray is object.

Here is an example:

  • the int64 array contains 4 int64 value.
  • the object array contains 4 pointers to 3 string objects.

2 of 4
78

@HYRY's answer is great. I just want to provide a little more context..

Arrays store data as contiguous, fixed-size memory blocks. The combination of these properties together is what makes arrays lightning fast for data access. For example, consider how your computer might store an array of 32-bit integers, [3,0,1].

If you ask your computer to fetch the 3rd element in the array, it'll start at the beginning and then jump across 64 bits to get to the 3rd element. Knowing exactly how many bits to jump across is what makes arrays fast.

Now consider the sequence of strings ['hello', 'i', 'am', 'a', 'banana']. Strings are objects that vary in size, so if you tried to store them in contiguous memory blocks, it'd end up looking like this.

Now your computer doesn't have a fast way to access a randomly requested element. The key to overcoming this is to use pointers. Basically, store each string in some random memory location, and fill the array with the memory address of each string. (Memory addresses are just integers.) So now, things look like this

Now, if you ask your computer to fetch the 3rd element, just as before, it can jump across 64 bits (assuming the memory addresses are 32-bit integers) and then make one extra step to go fetch the string.

The challenge for NumPy is that there's no guarantee the pointers are actually pointing to strings. That's why it reports the dtype as 'object'.

Shamelessly gonna plug my own course on NumPy where I originally discussed this.

🌐
GeeksforGeeks
geeksforgeeks.org › numpy › python-dtype-object-length-of-numpy-array-of-strings
Python | dtype object length of Numpy array of strings - GeeksforGeeks
March 14, 2019 - In numpy, if the underlying data type of the given object is string then the dtype of object is the length of the longest string in the array. This is so because we cannot create variable length string in numpy since numpy needs to know how ...
🌐
Python Course
python-course.eu › numerical-programming › numpy-data-objects-dtype.php
3. Numpy Data Objects, dtype | Numerical Programming
Some may have noticed that the strings in our previous array have been prefixed with a lower case "b". This means that we have created binary strings with the definition "('country', 'S20')". To get unicode strings we exchange this with the definition "('country', 'U20')". We will redefine our population table now: dt = np.dtype([('country', 'U20'), ('density', 'i4'), ('area', 'i4'), ('population', 'i4')]) population_table_2025 = np.array([ ('Netherlands', 544, 33720, 18_346_819), ('Belgium', 383, 30510, 11_700_000), ('United Kingdom', 287, 243610, 69_800_000), ('Germany', 241, 348560, 84_075_
🌐
Kaggle
kaggle.com › general › 188478
What is the difference between Pandas Object & String dtype
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds