This link should help: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q3:_Data_preparation

It's mentioned that the data is stored in a sparse array/matrix form. Essentially, it means only the non-zero data are stored, and any missing data is taken as holding value zero. For your questions:

a) Index merely serves as a way to distinguish between the features/parameters. In terms of a hyperspace, it's merely designating each component: Eg: 3-D ( 3 features) indices 1,2,3 would correspond to the x,y,z coordinates.

b) The correspondence is merely mathematical, when constructing the hyper-plane, these serve as coordinates.

c) If you skip one in between, it should be assigned a default value of zero.

In short, +1 1:0.7 2:1 3:1 translates to:

Assign to class +1, the point (0.7,1,1).

Answer from Govind Gopakumar on Stack Exchange

This link should help: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q3:_Data_preparation

It's mentioned that the data is stored in a sparse array/matrix form. Essentially, it means only the non-zero data are stored, and any missing data is taken as holding value zero. For your questions:

a) Index merely serves as a way to distinguish between the features/parameters. In terms of a hyperspace, it's merely designating each component: Eg: 3-D ( 3 features) indices 1,2,3 would correspond to the x,y,z coordinates.

b) The correspondence is merely mathematical, when constructing the hyper-plane, these serve as coordinates.

c) If you skip one in between, it should be assigned a default value of zero.

In short, +1 1:0.7 2:1 3:1 translates to:

Assign to class +1, the point (0.7,1,1).

Answer from Govind Gopakumar on Stack Exchange
🌐
Vivian Website
csie.ntu.edu.tw › ~cjlin › libsvm
LIBSVM -- A Library for Support Vector Machines
Please read the COPYRIGHT notice before using LIBSVM. Here is a simple applet demonstrating SVM classification and regression. Click on the drawing area and use ``Change'' to change class of data. Then use ``Run'' to see the results. ... Examples of options: -s 0 -c 10 -t 1 -g 1 -r 1 -d 3 Classify a binary data with polynomial kernel (u'v+1)^3 and C = 10
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.datasets.dump_svmlight_file.html
dump_svmlight_file — scikit-learn 1.8.0 documentation
Array containing pairwise preference constraints (qid in svmlight format). ... Samples may have several labels each (see https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html).
🌐
Vivian Website
csie.ntu.edu.tw › ~cjlin › libsvmtools › datasets
LIBSVM Data: Classification, Regression, and Multi-label
This page contains many classification, regression, multi-label and string data sets stored in LIBSVM format. For some sets raw materials (e.g., original texts) are also available. These data sets are from UCI, Statlog, StatLib and other collections. We thank their efforts.
🌐
CatBoost
catboost.ai › docs › concepts › input-data_libsvm
Dataset description in extended libsvm format - Input data | CatBoost
To specify categorical features, provide the columns description file in the following format: 0<\t>Label <feature1_index><\t><feature1_type><\t><feature1 name (optional)> <feature2_index><\t><feature2_type><\t><feature2 name (optional)> ... Feature ...
🌐
Vivian Website
csie.ntu.edu.tw › ~cjlin › libsvm › faq.html
LIBSVM FAQ
It depends on your data format. A simple way is to use libsvmwrite in the libsvm matlab/octave interface. Take a CSV (comma-separated values) file in UCI machine learning repository as an example. We download SPECTF.train. Labels are in the first column.
Find elsewhere
🌐
XGBoost Documentation
xgboost.readthedocs.io › en › latest › tutorials › input_format.html
Text Input Format of DMatrix — xgboost 3.3.0-dev documentation
This is most useful for ranking task, where the instances are grouped into query groups. You may embed query group ID for each instance in the LIBSVM file by adding a token of form qid:xx in each row:
🌐
GitHub
github.com › cjlin1 › libsvm › blob › master › README
libsvm/README at master · cjlin1/libsvm
See libsvm FAQ for the meaning of outputs. ... See 'Examples' in this file for examples.
Author   cjlin1
Top answer
1 of 1
18

The LIBSVM data format is given by:

<label> <index1>:<value1> <index2>:<value2> ...
...
...

As you can see, this forms a matrix [(IndexCount + 1) columns, LineCount rows]. More precisely a sparse matrix. If you specify a value for each index, you have a dense matrix, but if you only specify a few indices like <label> <5:value> <8:value>, only the indices 5 and 8 and of course label will have a custom value, all other values are set to 0. This is just for notational simplicity or to save space, since datasets can be huge.

For the meanig of the tags, I cite the ReadMe file:

<label> is the target value of the training data. For classification, it should be an integer which identifies a class (multi-class classification is supported). For regression, it's any real number. For one-class SVM, it's not used so can be any number. is an integer starting from 1, <value> is a real number. The indices must be in an ascending order.

As you can see, the label is the data you want to predict. The index marks a feature of your data and its value. A feature is simply an indicator to associate or correlate your target value with, so a better prediction can be made.

Totally Fictional story time: Gabriel Luna (a totally fictional character) wants to predict his energy consumption for the next few days. He found out, that the outside temperature from the day before is a good indicator for that, so he selects Temperature with index 1 as feature. Important: Indices always start at one, zero can sometimes cause strange LIBSVM behaviour. Then, he surprisingly notices, that the day of the week (Monday to Sunday or 0 to 6) also affects his load, so he selects it as a second feature with index 2. A matrix row for LIBSVM now has the following format:

<myLoad_Value> <1:outsideTemperatureFromYesterday_Value> <2:dayOfTheWeek_Value>

Gabriel Luna (he is Batman at night) now captures these data over a few weeks, which could look something like this (load in kWh, temperature in °C, day as mentioned above):

0.72 1:25 2:0
0.65 1:21 2:1
0.68 2:29 2:2
...

Notice, that we could leave out 2:0, because of the sparse matrix format. This would be your training data to train a LIBSVM model. Then, we predict the load of tomorrow as follows. You know the temperature of today, let us say 23°C and today is Tuesday, which is 1, so tomorrow is 2. So, this is the line or vector to use with the model:

0 1:23 2:2

Here, you can set the <label> value arbitrarily. It will be overwritten with the predicted value. I hope this helps.

🌐
XGBoost
xgboost.readthedocs.io › en › stable › tutorials › input_format.html
Text Input Format of DMatrix — xgboost 3.2.0 documentation
This is most useful for ranking task, where the instances are grouped into query groups. You may embed query group ID for each instance in the LIBSVM file by adding a token of form qid:xx in each row:
🌐
Free Source Library
freesourcelibrary.com › home › programming languages
Understanding LibSVM Format - Free Source Library
January 2, 2025 - The features are represented as follows: feature 1 has a value of 0.5, feature 2 has a value of 1.2, and feature 3 has a value of 0.7. The feature indices are separated by spaces, and the index-value pairs are separated by colons.
🌐
Paperspace
blog.paperspace.com › multi-class-classification-using-libsvm
Multi-class classification using LIBSVM
October 22, 2022 - Finally, the LIBSVM format is label index:feature… label 1:feature1_value 2:feature2_value 3:feature3_value and so on, for each row in the dataset. An example is: 2 1: 0.22 2: 0.45 … where 2 is the label, and 1, 2, …, ...
🌐
GitHub
gist.github.com › trsdln › fedc8b1a9fabbca9106b
Class for converting CSV file to libsvm format · GitHub
Class for converting CSV file to libsvm format · Raw · csv2libsvm.py · This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
🌐
Vivian Website
csie.ntu.edu.tw › ~cjlin › libsvmtools › datasets › binary.html
LIBSVM Data: Classification (Binary Class)
For the LIBSVM-format data, we treat each feature as a categorical type and use binary encoding to generate a sparse feature vector.
🌐
Blogger
lekshmideepu.blogspot.com › 2012 › 02 › libsvm-tutorial.html
Web Developers Portal: LIBSVM tutorial
So we need to convert the data into libsvm format which contains only numerical values. For example , If we have imaginary data records like this: man voice:low figure:big income:good woman voice:high figure:slim income:fare 1.
🌐
Vivian Website
csie.ntu.edu.tw › ~cjlin › libsvmtools › datasets › multiclass.html
LIBSVM Data: Classification (Multi Class)
Preprocessing: We consider format 2 (cropped digits) of the data set. For every image, in the RGB order, by rows we convert 32x32 pixels to feature values. That is, (row 1, R), (row 2, R), ..., (row 1, G), ...
🌐
PyPI
pypi.org › project › libsvm-official
libsvm-official · PyPI
Note that all arguments and return values are in ctypes format. You need to handle them carefully. >>> from libsvm.svm import * >>> prob = svm_problem(np.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2])))) >>> param = svm_parameter('-c 4') >>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model # Convert a tuple of ndarray (index, data) to feature_nodearray, a ctypes structure # Note that index starts from 0, though the following example will be changed to 1:1, 3:1 internally >>> x0, max_idx = gen_svm_nodearray((np.asarray([0,2]), np.asarray([1,1]))) >>> label = libsvm.svm_predict(m, x0) Design Description ================== There are two files svm.py and svmutil.py, which respectively correspond to low-level and high-level use of the interface.
      » pip install libsvm-official
    
Published   Dec 29, 2025
Version   3.37.0
🌐
GitHub
github.com › masatoi › cl-libsvm-format
GitHub - masatoi/cl-libsvm-format: A fast LibSVM data format reader for Common Lisp
(ql:quickload :cl-libsvm-format) (multiple-value-bind (dataset dim) (svmformat:parse-file "/home/wiz/datasets/mnist.scale") (defparameter mnist-dataset dataset) (defparameter mnist-dimension dim)) ;;; First datum (car mnist-dataset) ;; (5 153 0.0117647005 154 0.0705882 155 0.0705882 156 0.0705882 157 0.494118 158 ;; 0.533333 159 0.686275 160 0.101961 161 0.65098 162 1.0 163 0.968627 164 ;; 0.498039 177 0.117647 178 0.141176 179 0.368627 180 0.603922 181 0.666667 182 ;; 0.992157 183 0.992157 184 0.992157 185 0.992157 186 0.992157 187 0.882353 188 ;; 0.67451 189 0.992157 190 0.94901997 191 0.764
Author   masatoi