You cannot convert an empty string ('') to a float, as written in the error message.
You have to convert this empty string to a float-like value beforehand: df_tmp['rain_1h'] = df_tmp['rain_1h'].replace('', '0').astype(float) for instance.
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.
Pandas Heatmap: could not convert string to float - Data Science Stack Exchange
Could not convert string to float: ''
python - ValueError: count not convert string to float - Bioinformatics Stack Exchange
Value error in Python: Could not convert string to float: Male
Videos
I've faced the same error while trying to create a heatmap. Following code solved my problem.
hp_train.corr(numeric_only=True)
You can check column dtypes using hp_train.dtypes. Subset the dataframe for only the desired columns before calling corr.
For example if you only want float64 cols
dtype_df = hp_train.dtypes
float_cols = dtype_df.iloc[(dtype_df=='float64').values].index
hp_train[float_cols].corr()
The following would be useful,
print (heterozygosity_df.columns)
This looks like static typing issue within pandas. What I suspect is heterozygosity_df.['chrI'] is a column in the dataframe. What I think has happened is there's a mix of strings and floats within this column. pandas has set this as a "string" but you are wanting to perform numerical operations. Thus the solution is simply
print(heterozygosity_df.dtypes) # this should state 'chrI' is a "category" or "string"
heterozygosity_df['chrI'] = heterozygosity_df['chrI'].astype(float)
print(heterozygosity_df.dtypes)
If you have multiple changes the syntax is
heterozygosity_df = heterozygosity_df.astype({'chrI':'float', 'egColumn':'category'})
I suspect there will be other errors, e.g. the header is the first row of the data column. This is because again for pandas to automatically assign a column to a "string" means there must be a string value within the column.
From the comments. I see whats happening. The easy solution is ...
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4,
....
}
heterozygosity_df['chr'] = heterozygosity_df['chr'].str.replace(replacement, regex=True)
From the comments ... good it works ... this is what I would have personally done ..
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4, # continue for all chromosomes
}
heterozygosity_df = pd.read_csv("file.tsv", sep="\t", header=None).set_axis(['chr', 'pos', 'het'], axis=1, copy=False)
heterozygosity_df['chr'] = heterozygosity_df['chr'].str.replace(replacement, regex=True).astype('int')
Normally str should be in place because when 'chr' is imported its an object. However, if it works thats the only thing that counts.
Ah! I finally got it to work! I used your command but I had to change it up a little bit to work with my data. The final command I used is this:
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4,
"chrV": 5,
"chrVI": 6,
"chrVII": 7,
"chrVIII": 8,
"chrIX": 9,
"chrX": 10,
"chrXI": 11,
"chrXII": 12,
"chrXIII": 13,
"chrXIV": 14,
"chrXV": 15,
"chrXVI": 16,
"chrmt": 17
}
heterozygosity_df['chr'] = heterozygosity_df['chr'].replace(replacement, regex=False)
And with that I was able to generate plots showing all of the chromosomes! (I have to figure out how to change the axis labels from 1,2,3,4...10 back to the chromosome but that's a future problem). Thank you so much for all your help!!