You have to do some encoding before using fit(). As it was told fit() does not accept strings, but you solve this.

There are several classes that can be used :

  • LabelEncoder : turn your string into incremental value
  • OneHotEncoder : use One-of-K algorithm to transform your String into integer

Personally, I have post almost the same question on Stack Overflow some time ago. I wanted to have a scalable solution, but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective, but if you have a lot of different strings the matrix will grow very quickly and memory will be required.

Answer from RPresle on Stack Overflow
🌐
Stack Exchange
datascience.stackexchange.com › questions › 110669 › could-not-convert-string-to-float-yellow
machine learning - could not convert string to float: 'YELLOW' - Data Science Stack Exchange
ML models have an understanding of vectors only. As the data has categorical features, need to encode them and apply the ML models. That is the reason you encounter ValueError: could not convert string to float: 'YELLOW'
🌐
Reddit
reddit.com › r/learnpython › can not convert a string to a float when using naive bayes from sklearn.
r/learnpython on Reddit: Can not convert a string to a float when using Naive Bayes from Sklearn.
February 27, 2022 -

So I am trying to use the Naive Bayes model from Sklearn to run some data. It is a machine learning model. Although every time I run it, it says I can't convert a string to a float. Here's my code:

# Import necessary libraries.
from sklearn.naive_bayes import MultinomialNB # Import the Naive Bayes model.
from sklearn.model_selection import train_test_split # Import the train_test_split function.
from sklearn.metrics import accuracy_score # For testing the accuracy of the model
import pandas as pd # Import pandas

print('NAIVE_BAYES.PY IS RUNNING')

df = pd.read_csv('C:\\Users\\gjohn\\Documents\\code\\machineLearning\\trading_bot\\train_test.csv') # Reads in the filtered posts.
classes = df['class'] # Gets the labels from the dataframe.
df.drop('class', axis=1) # Drops the class column from the dataframe.

# Split the data into training and testing data.
train_x, test_x, train_y, test_y = train_test_split(df, classes, test_size=0.2)

# Create the Naive Bayes model.
model = MultinomialNB()
# Train the model.
model.fit(train_x, train_y)
# Test the model.
y_predict = model.predict(test_x)
# Calculate the accuracy of the model.
accuracy = accuracy_score(test_y, y_predict)
print(f'Accuracy: {accuracy}')

And here is the error:

Traceback (most recent call last):
  File "c:\Users\gjohn\Documents\code\machineLearning\trading_bot\naive_bayes.py", line 22, in <module>
    model.fit(train_x, train_y)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\naive_bayes.py", line 663, in fit
    X, y = self._check_X_y(X, y)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\naive_bayes.py", line 523, in _check_X_y
    return self._validate_data(X, y, accept_sparse="csr", reset=reset)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 572, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 956, in check_X_y
    X = check_array(
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 738, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py", line 1993, in __array__
    return np.asarray(self._values, dtype=dtype)
  File "C:\Users\gjohn\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\core\_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'Tranzylvania'

And every time I run it, it is a different word. For example this time it was "Tranzylvania", but the next time I run it, there will be something else. Can anyone please help me? I'm stumped.

🌐
Codedamn
codedamn.com › news › programming
Fix ValueError: could not convert string to float
November 7, 2022 - input_string = '' try: converted = float(input_string) except ValueError: converted = 0 print(converted) # 0 (as except block is executed) Code language: Python (python) This method has a huge advantage in that all three previous approaches could be used in the except block and can handle any case with ease.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › valueerror-could-not-convert-string-to-float-decision-tree
ValueError: could not convert string to float decision tree - GeeksforGeeks
July 23, 2025 - When you encounter the ValueError: could not convert string to float error in decision trees, it means the model is trying to process text data, which it can't. Decision trees work with numerical data, so you need to convert categorical data ...
🌐
Statology
statology.org › home › how to fix in pandas: could not convert string to float
How to Fix in Pandas: could not convert string to float
July 16, 2022 - #attempt to convert 'revenue' from string to float df['revenue'] = df['revenue'].astype(float) ValueError: could not convert string to float: '$400.42'
🌐
GitHub
github.com › scikit-learn › scikit-learn › discussions › 19624
ValueError: could not convert string to float: 'OLIFE' · scikit-learn/scikit-learn · Discussion #19624
Hi, im trying to calibrate logistic regression classifier and i get the error ValueError: could not convert string to float: 'OLIFE', I did onehotencode my categorical values using pipeline, it works fine when i test my model but when i calibrate it doesnt work even if im passing the pipeline model in to CalibratedClassifierCV, kindly assist please as im new to machine learning · import pandas as pd import numpy as np from pandas_profiling import ProfileReport from sklearn.preprocessing import OneHotEncoder,LabelEncoder from sklearn.impute import SimpleImputer from sklearn.compose import make
Author   scikit-learn
Find elsewhere
🌐
Medium
louwersj.medium.com › solved-valueerror-could-not-convert-string-to-float-afbc5c3828e7
Solved: ValueError: could not convert string to float: ‘’ | by Johan Louwers | Medium
September 23, 2022 - This commonly is the case if your dataset is not clean and “avg_use” contains values that are not able to be converted to a float. Even though a dataset should be clean, obviously, reality shows that datasets are not always as clean as we ...
Top answer
1 of 2
1

Once I assume you are using text data as your input matrix X. The first point is that you have to include your preprocessing step as you would do when not using a calibrated classifier, so as you already know you can use a Pipeline like so:

calibrated_svc = CalibratedClassifierCV(linear_svc,
                                        method='sigmoid',
                                        cv=3) 


model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', calibrated_svc)]).fit(X, y)

Another option if your are interested in using probabilities in your SVM you can set the parameter probability = True inside your SVM but using the class SVC with a linear kernel is equvilalent to LinearSVC like:

model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',SVC(probability = True, kernel = 'linear') )]).fit(X, y)

This will run a Logistic regression on the top of the binary predictions of the SVM.

Both options are feasible if you are only interested in using probabilities per se but if you are also interested on the calibration of your probabilities, the first option is better

2 of 2
2

For any kind of Machine Learning task or a NLP task (which is what you are doing), you need to convert string/text values to numeric values. The machine cannot uderstand or work with string values. It only understands numeric values.

So for example if you are doing a machine learning task, you would use libraries like OneHotEncoder, LabelEncoder etc to covert string values to numeric.

For your case, you are working on a NLP task which uses text values instead of string values. So you need to convert them into numeric values first and then fit the preferred algorithm. There are many ways to encode text into numeric such as Bag of Words, Tfidf, word2vec etc. You can read about them by searching on Google.

🌐
Edureka Community
edureka.co › home › community › categories › machine learning › valueerror could not convert string to float in...
ValueError could not convert string to float in Machine learning | Edureka Community
April 14, 2020 - Hi Guys, I am trying to filter my dataset using constant variable method, but it shows me the bellow ... NE 37010-5101' How can I solve this error?
🌐
GitHub
github.com › scikit-learn › scikit-learn › issues › 19625
ValueError: could not convert string to float: 'OLIFE' · Issue #19625 · scikit-learn/scikit-learn
November 20, 2020 - Hi, im trying to calibrate logistic regression classifier and i get the error ValueError: could not convert string to float: 'OLIFE', I did onehotencode my categorical values using pipeline, it wor...
Author   Solly7
🌐
Reddit
reddit.com › r/learnpython › sklearn problem - valueerror: could not convert string to float: normal
r/learnpython on Reddit: Sklearn Problem - ValueError: could not convert string to float: Normal
July 20, 2017 -

I get the following error when I run my script - "ValueError: could not convert string to float: Normal

Any help would be greatly appreciated

rom sklearn.linear_model import LogisticRegression #logistic regression from sklearn import svm #support vector Machine from sklearn.ensemble import RandomForestClassifier #Random Forest from sklearn.neighbors import KNeighborsClassifier #KNN from sklearn.naive_bayes import GaussianNB #Naive bayes from sklearn.tree import DecisionTreeClassifier #Decision Tree from sklearn.model_selection import train_test_split #training and testing data split from sklearn import metrics #accuracy measure from sklearn.metrics import confusion_matrix #for confusion matrix

train,test=train_test_split(train_csv, test_size=0.3, random_state=0) train_X=train[train.columns[1:]] train_Y=train[train.columns[:1]] test_X=test[test.columns[1:]] test_Y=test[test.columns[:1]] X=train_csv[train_csv.columns[1:]] Y=train_csv['SalePrice']

Radial Support Vector Machines(rbf-SVM)

model=svm.SVC(kernel='rbf',C=1,gamma=0.1) model.fit(train_X, train_Y) prediction1=model.predict(test_X) print('Accuracy for rbf SVM is ', metrics.accuracy_score(prediction1,test_Y))

🌐
GitHub
github.com › DanilZherebtsov › verstack › issues › 30
could not convert string to float: 'x' - using FeatureSelector · Issue #30 · DanilZherebtsov/verstack
November 7, 2022 - 842 """ 843 first_call = not hasattr(self, "n_samples_seen_") --> 844 X = self._validate_data( 845 X, 846 accept_sparse=("csr", "csc"), 847 dtype=FLOAT_DTYPES, 848 force_all_finite="allow-nan", 849 reset=first_call, 850 ) 851 n_features = X.shape[1] 853 if sample_weight is not None: File C:\Anaconda3\envs\python_310\lib\site-packages\sklearn\base.py:577, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params) 575 raise ValueError("Validation should be done on X, y or both.") 576 elif not no_val_X and no_val_y: --> 577 X = check_array(X, input_name="X", **check_p
Author   balgad
🌐
Stack Exchange
datascience.stackexchange.com › questions › 114774 › when-i-run-random-forest-classification-model-then-at-every-rows-of-my-train-dat
python - when I run Random Forest classification model then at every rows of my train data set show this error (ValueError: could not convert string to float:) - Data Science Stack Exchange
''' from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators = 500, max_depth = None, min_samples_split=2, min_samples_leaf =1, bootstrap = True, random_state=0) forest = forest.fit(X_train, y_train) print(forest.score(X_test, y_test)) ''' ... The error message is not lying to you :) It cannot convert the string "one favourite christmas gifts year love" to a float.
🌐
Saturn Cloud
saturncloud.io › blog › how-to-handle-the-pandas-valueerror-could-not-convert-string-to-float
How to Handle the pandas ValueError could not convert string to float | Saturn Cloud Blog
October 19, 2023 - The pandas ValueError occurs when you use the float() function to convert a string to a float, but the string contains characters that cannot be interpreted as a float.
🌐
GeeksforGeeks
geeksforgeeks.org › pandas › how-to-handle-pandas-value-error-could-not-convert-string-to-float
How To Handle Pandas Value Error : Could Not Convert String To Float - GeeksforGeeks
July 23, 2025 - When the string in the dataframe contains inappropriate characters that cause problems in converting the string to a float type, the replace() method is a good and easy way to remove those characters from the string.
🌐
GitHub
github.com › scikit-learn › scikit-learn › issues › 14297
Value error could not convert string to float: in clf.score() for LogisticRegression · Issue #14297 · scikit-learn/scikit-learn
July 9, 2019 - ~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 565 # make sure we actually converted to numeric: 566 if dtype_numeric and array.dtype.kind == "O": --> 567 array = array.astype(np.float64) 568 if not allow_nd and array.ndim >= 3: 569 raise ValueError("Found array with dim %d.
Author   asis-shukla
🌐
GitHub
github.com › scikit-learn-contrib › imbalanced-learn › issues › 193
ValueError: could not convert string to float: 'aaa' · Issue #193 · scikit-learn-contrib/imbalanced-learn
October 13, 2016 - ValueError: could not convert string to float: 'aaa'#193 · Copy link · simonm3 · opened · on Nov 23, 2016 · Issue body actions · I have imbalanced classes with 10,000 1s and 10m 0s. I want to undersample before I convert category columns to dummies to save memory.
Author   simonm3