Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
Answer from Anurag Uniyal on Stack OverflowObviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
My error was very simple: the text file containing the data had some space (so not visible) character on the last line.
As an output of grep, I had 45 instead of just 45.
Could not convert string to float
Python pandas - can't convert string to float (I think b/c of multiple data types in column...)
How Do I Convert String To Float?
Value error in Python: Could not convert string to float: Male
Videos
I have a list of strings and I would to get the average of the list. so I need to convert those strings in the list into float but I keep getting this:
ValueError: could not convert string to float: '"104.8"\n'
How do I solve it? Thanks
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.
The following would be useful,
print (heterozygosity_df.columns)
This looks like static typing issue within pandas. What I suspect is heterozygosity_df.['chrI'] is a column in the dataframe. What I think has happened is there's a mix of strings and floats within this column. pandas has set this as a "string" but you are wanting to perform numerical operations. Thus the solution is simply
print(heterozygosity_df.dtypes) # this should state 'chrI' is a "category" or "string"
heterozygosity_df['chrI'] = heterozygosity_df['chrI'].astype(float)
print(heterozygosity_df.dtypes)
If you have multiple changes the syntax is
heterozygosity_df = heterozygosity_df.astype({'chrI':'float', 'egColumn':'category'})
I suspect there will be other errors, e.g. the header is the first row of the data column. This is because again for pandas to automatically assign a column to a "string" means there must be a string value within the column.
From the comments. I see whats happening. The easy solution is ...
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4,
....
}
heterozygosity_df['chr'] = heterozygosity_df['chr'].str.replace(replacement, regex=True)
From the comments ... good it works ... this is what I would have personally done ..
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4, # continue for all chromosomes
}
heterozygosity_df = pd.read_csv("file.tsv", sep="\t", header=None).set_axis(['chr', 'pos', 'het'], axis=1, copy=False)
heterozygosity_df['chr'] = heterozygosity_df['chr'].str.replace(replacement, regex=True).astype('int')
Normally str should be in place because when 'chr' is imported its an object. However, if it works thats the only thing that counts.
Ah! I finally got it to work! I used your command but I had to change it up a little bit to work with my data. The final command I used is this:
replacement = {
"chrI": 1,
"chrII": 2,
"chrIII": 3,
"chrIV": 4,
"chrV": 5,
"chrVI": 6,
"chrVII": 7,
"chrVIII": 8,
"chrIX": 9,
"chrX": 10,
"chrXI": 11,
"chrXII": 12,
"chrXIII": 13,
"chrXIV": 14,
"chrXV": 15,
"chrXVI": 16,
"chrmt": 17
}
heterozygosity_df['chr'] = heterozygosity_df['chr'].replace(replacement, regex=False)
And with that I was able to generate plots showing all of the chromosomes! (I have to figure out how to change the axis labels from 1,2,3,4...10 back to the chromosome but that's a future problem). Thank you so much for all your help!!
I've faced the same error while trying to create a heatmap. Following code solved my problem.
hp_train.corr(numeric_only=True)
You can check column dtypes using hp_train.dtypes. Subset the dataframe for only the desired columns before calling corr.
For example if you only want float64 cols
dtype_df = hp_train.dtypes
float_cols = dtype_df.iloc[(dtype_df=='float64').values].index
hp_train[float_cols].corr()