Since the question was not very clear to begin with and attempts to explain it were going in vain, I decided to download the dataset and do it for myself. So just to make sure we are working with the same dataset iris.head() will give you or something similar, a few names might be changed and a few values, but overall strucure will be the same.
Now the first four columns are features and the fifth one is target/output.
Now you will need your X and Y as numpy arrays, to do that use
X = iris[ ['sepal length:','sepal Width:','petal length','petal width']].values
Y = iris[['Target']].values
Now since Y is categorical Data, You will need to one hot encode it using sklearn's LabelEncoder and scale the input X to do that use
label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
X = StandardScaler().fit_transform(X)
To keep with the norm of separate train and test data, split the dataset using
X_train , X_test, y_train, y_test = train_test_split(X,Y)
Now just train it on your model using X_train and y_train
clf = SVC(C=1.0, kernel='rbf').fit(X_train,y_train)
After this you can use the test data to evaluate the model and tune the value of C as you wish.
Edit Just in case you don't know where the functions are here are the import statements
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
Answer from anand_v.singh on Stack Overflowpython - Loading a Dataset for Linear SVM Classification from a CSV file - Stack Overflow
machine learning - How to train svm in scikit learn from training data present in a csv file - Stack Overflow
java - Converting CSV file to LIBSVM compatible data file using python - Stack Overflow
python - load data from csv into Scikit learn SVM - Stack Overflow
Videos
First, I think you have an error in your CSV in the first row:
25.3, 12.4, 2.35, 4.89. 1, 2.35, 5.65, 7, 6.24, 5.52, M
I just assumed it should be 4.89, 1, and not 4.89. 1.
Second, I recommend you to use pandas to read that CSV, and then do this:
import pandas as pd
data = pd.read_csv('prueba.csv', header=None, usecols=[i for i in range(11)])
# the usecols=[i for i in range(11)] will create a list of numbers for your columns
# that line will make a dataframe called data, which will contain your data.
l = [i for i in range(10)]
X_train = data[l]
y_train = data[10]
This is the most easy way to have ready your data for any machine learning algorithm in scikit-learn.
import pandas as pd
df = pd.read_csv(/path/to/csv, header=None, index_col=False)
x = df.iloc[:,:-1].values
y = df.iloc[:,-1:].values
You can use csv2libsvm.py to convert csv to libsvm data
python csv2libsvm.py iris.csv libsvm.data 4 True
where 4 means target index, and True means csv has a header.
Finally, you can get libsvm.data as
0 1:5.1 2:3.5 3:1.4 4:0.2
0 1:4.9 2:3.0 3:1.4 4:0.2
0 1:4.7 2:3.2 3:1.3 4:0.2
0 1:4.6 2:3.1 3:1.5 4:0.2
...
from iris.csv
150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
4.7,3.2,1.3,0.2,0
4.6,3.1,1.5,0.2,0
...
csv2libsvm.py does not work with Python3, and also it does not support label targets (string targets), I have slightly modified it. Now It should work with Python3 as well as wıth the label targets. I am very new to Python, so my code may do not follow the best practices, but I hope it is good enough to help someone.
#!/usr/bin/env python
"""
Convert CSV file to libsvm format. Works only with numeric variables.
Put -1 as label index (argv[3]) if there are no labels in your file.
Expecting no headers. If present, headers can be skipped with argv[4] == 1.
"""
import sys
import csv
import operator
from collections import defaultdict
def construct_line(label, line, labels_dict):
new_line = []
if label.isnumeric():
if float(label) == 0.0:
label = "0"
else:
if label in labels_dict:
new_line.append(labels_dict.get(label))
else:
label_id = str(len(labels_dict))
labels_dict[label] = label_id
new_line.append(label_id)
for i, item in enumerate(line):
if item == '' or float(item) == 0.0:
continue
elif item=='NaN':
item="0.0"
new_item = "%s:%s" % (i + 1, item)
new_line.append(new_item)
new_line = " ".join(new_line)
new_line += "\n"
return new_line
# ---
input_file = sys.argv[1]
try:
output_file = sys.argv[2]
except IndexError:
output_file = input_file+".out"
try:
label_index = int( sys.argv[3] )
except IndexError:
label_index = 0
try:
skip_headers = sys.argv[4]
except IndexError:
skip_headers = 0
i = open(input_file, 'rt')
o = open(output_file, 'wb')
reader = csv.reader(i)
if skip_headers:
headers = reader.__next__()
labels_dict = {}
for line in reader:
if label_index == -1:
label = '1'
else:
label = line.pop(label_index)
new_line = construct_line(label, line, labels_dict)
o.write(new_line.encode('utf-8'))