Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
Answer from Anurag Uniyal on Stack OverflowObviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
My error was very simple: the text file containing the data had some space (so not visible) character on the last line.
As an output of grep, I had 45 instead of just 45.
I have a dollar value from a statement -52.23 and I'm simply trying to convert it to a float so I can use it in some calculations but I keep getting this error and after some research I still don't understand why.
I have the following code going through a statement pulling out each value. The values are in a csv file and the format looks like this:
07/06/2022, 07/07/2022, AMZN, Shopping, Sale, -52.23
Loop to pull the values:
with open(file, mode='r') as f:
csv_reader = csv.reader(f)
for row in csv_reader:
date = row[0]
name = row[2]
amt = float(row[5])
category = 'misc'
transaction = (date, name, category, amt)
print(transaction)Current printed result (when doing amt = row[5])
('07/06/22', 'AMZN', 'misc', '-52.23')Update: Header of the csv file was the issue, once I skipped that row it worked.
Though not the best solution, I found some success by converting it into pandas dataframe and working along.
code snippet
# convert X into dataframe
X_pd = pd.DataFrame(data=X)
# replace all instances of URC with 0
X_replace = X_pd.replace('�',0, regex=True)
# convert it back to numpy array
X_np = X_replace.values
# set the object type as float
X_fa = X_np.astype(float)
input
array([['85', '0', '0', '1980', '0', '0'],
['233', '54', '27', '-1', '0', '0'],
['���', '�', '�����', '�', '��', '���']], dtype='<U5')
output
array([[ 8.50e+01, 0.00e+00, 0.00e+00, 1.98e+03, 0.00e+00, 0.00e+00],
[ 2.33e+02, 5.40e+01, 2.70e+01, -1.00e+00, 0.00e+00, 0.00e+00],
[ 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00]])
Let's try to use pandas dataframe and convert strings into numeric classes
from sklearn import preprocessing
def convert(data):
number = preprocessing.LabelEncoder()
data['column_name'] = number.fit_transform(data['column_name'])
data=data.fillna(-999) # fill holes with default value
return data
call the above convert() function like, test = convert(test)
Try to convert timestamp column as int:
df['timestamp'] = df['timestamp'].astype('datetime64').astype(int)
Adding parse_dates=[<columns>] to pd.read_csv will cause Pandas to automatically convert strings that look like dates to actual datetime objects:
df = pd.read_csv('C:/Users/Desktop/labeling/fCCC.csv', parse_dates=['timestamp'])
df['timestamp'] = df['timestamp'].astype('int')