Brave Search

Pandas: change data type of Series to String

stackoverflow.com › questions › 22231592 › pandas-change-data-type-of-series-to-string

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)

End to end example:

import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John', 'Alice', 'Bob', 'John', 'Alice'],
    'Age': [25, 30, 35, 25, 30],
    'City': ['New York', 'London', 'Paris', 'New York', 'London'],
    'Salary': [50000, 60000, 70000, 50000, 60000],
    'Category': ['A', 'B', 'C', 'A', 'B']
}

df = pd.DataFrame(data)

# Print the DataFrame
print("Original DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
cat_cols_ = None
# Apply the code to change data types
if not cat_cols_:
    # Get the columns with object data type
    object_columns = df.select_dtypes(include=['object']).columns.tolist()
    
    if len(object_columns) > 0:
        print(f"\nObject columns found, converting to string: {object_columns}")
        
        # Convert object columns to string type
        df[object_columns] = df[object_columns].astype('string')
    
    # Get the categorical columns (including string and category data types)
    cat_cols_ = df.select_dtypes(include=['category', 'string']).columns.tolist()

# Print the updated DataFrame and data types
print("\nUpdated DataFrame:")
print(df)
print("\nUpdated data types:")
print(df.dtypes)
print(f"\nCategorical columns (cat_cols_): {cat_cols_}")

Original DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Data types:
Name        object
Age          int64
City        object
Salary       int64
Category    object
dtype: object

Object columns found, converting to string: ['Name', 'City', 'Category']

Updated DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Updated data types:
Name        string[python]
Age                  int64
City        string[python]
Salary               int64
Category    string[python]
dtype: object

Categorical columns (cat_cols_): ['Name', 'City', 'Category']

Answer from rocksNwaves on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 22231592 › pandas-change-data-type-of-series-to-string

python - Pandas: change data type of Series to String - Stack Overflow

Top answer

1 of 11

251

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)

End to end example:

import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John', 'Alice', 'Bob', 'John', 'Alice'],
    'Age': [25, 30, 35, 25, 30],
    'City': ['New York', 'London', 'Paris', 'New York', 'London'],
    'Salary': [50000, 60000, 70000, 50000, 60000],
    'Category': ['A', 'B', 'C', 'A', 'B']
}

df = pd.DataFrame(data)

# Print the DataFrame
print("Original DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
cat_cols_ = None
# Apply the code to change data types
if not cat_cols_:
    # Get the columns with object data type
    object_columns = df.select_dtypes(include=['object']).columns.tolist()
    
    if len(object_columns) > 0:
        print(f"\nObject columns found, converting to string: {object_columns}")
        
        # Convert object columns to string type
        df[object_columns] = df[object_columns].astype('string')
    
    # Get the categorical columns (including string and category data types)
    cat_cols_ = df.select_dtypes(include=['category', 'string']).columns.tolist()

# Print the updated DataFrame and data types
print("\nUpdated DataFrame:")
print(df)
print("\nUpdated data types:")
print(df.dtypes)
print(f"\nCategorical columns (cat_cols_): {cat_cols_}")

Original DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Data types:
Name        object
Age          int64
City        object
Salary       int64
Category    object
dtype: object

Object columns found, converting to string: ['Name', 'City', 'Category']

Updated DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Updated data types:
Name        string[python]
Age                  int64
City        string[python]
Salary               int64
Category    string[python]
dtype: object

Categorical columns (cat_cols_): ['Name', 'City', 'Category']

2 of 11

127

You can convert all elements of id to str using apply

df.id.apply(str)

0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610

Edit by OP:

I think the issue was related to the Python version (2.7.), this worked:

df['id'].astype(basestring)
0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610
Name: id, dtype: object

reddit.com › r/learnpython › str or astype(str)

r/learnpython on Reddit: str or astype(str)

February 15, 2024 -

Hi, I'm working on filtering a df and was advised to use the following condition:

condition = survey[column1].str.startswith("Pos") & survey[column2].astype(str).str.endswith(str(y))

It runs fine, I was just wondering why I would use .str in one case and .astype(str) in the other. Does it have to do with efficiency?

Videos