since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Answer from Siraj S. on Stack Overflowsince strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Did you try assigning it back to the column?
df['column'] = df['column'].astype('str')
Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:
df['column_new'] = df['column'].str.split(',')
Videos
Trying to use the YouTube API to pull through some videos for data analysis and am currently using just two videos in a dataframe to play around with the functionality as I'm new to all of this.
I'm using another API to get the transcripts for each video but I need to input the video_id into that API to get transcripts for each video.
The only problem is everything is stored as an object and whenever I try .astype(str) or something like that, it still says the data is an object and means I can't do anything with the data when a string is a required argument for the other API
This is what I get when calling .info() on my dataframe:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 video_id 2 non-null object 1 publishedAt 2 non-null object 2 channelId 2 non-null object 3 title 2 non-null object 4 description 2 non-null object 5 channelTitle 2 non-null object 6 tags 2 non-null object 7 categoryId 2 non-null object 8 liveBroadcastContent 2 non-null object 9 defaultAudioLanguage 2 non-null object dtypes: object(10) memory usage: 288.0+ bytes
Any help would be really appreciated or an explanation of how these issues are usually handled
Use astype('string') instead of astype(str) :
df['column'] = df['column'].astype('string')
You could read the excel specifying the dtype as str:
df = pd.read_excel("Excelfile.xlsx", dtype=str)
then use string replace in particulars column as below:
df['particulars'] = df[df['particulars'].str.replace('/','')]
Notice that the df assignment is also a dataframe in '[]' brackets.
When you're using the below command in your program, it returns a string which you're trying to assign to a dataframe column. Hence the error.
df['particulars'] = df['particulars'].str.replace('/',' ')
I tried many methods, but I found that this is the only method that worked for me and convert the object into text
df['column'].astype('string')
I realize that object is not a problem, instead is the type that pandas use for string or mixed types (https://pbpython.com/pandas_dtypes.html). More precisely:
| Pandas dtype | Python type | NumPy type | Usage |
|---|---|---|---|
| object | str or mixed | string_, unicode_, mixed types | Text or mixed numeric and non-numeric values |
| int64 | int | int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64 | Integer numbers |
| float64 | float | float_, float16, float32, float64 | Floating point numbers |
Infact if I print the type of a single cell it gives me str
type(UFC_db.at[0,'Referee'])
str
and as a string I can change it with another string
UFC_db.at[0,'Referee'] = 'XXXXXX'
UFC_db.at[0,'Referee']
'XXXXXX'