"..approach classification problem through regression.." by "regression" I will assume you mean linear regression, and I will compare this approach to the "classification" approach of fitting a logistic regression model.

Before we do this, it is important to clarify the distinction between regression and classification models. Regression models predict a continuous variable, such as rainfall amount or sunlight intensity. They can also predict probabilities, such as the probability that an image contains a cat. A probability-predicting regression model can be used as part of a classifier by imposing a decision rule - for example, if the probability is 50% or more, decide it's a cat.

Logistic regression predicts probabilities, and is therefore a regression algorithm. However, it is commonly described as a classification method in the machine learning literature, because it can be (and is often) used to make classifiers. There are also "true" classification algorithms, such as SVM, which only predict an outcome and do not provide a probability. We won't discuss this kind of algorithm here.

Linear vs. Logistic Regression on Classification Problems

As Andrew Ng explains it, with linear regression you fit a polynomial through the data - say, like on the example below we're fitting a straight line through {tumor size, tumor type} sample set:

Above, malignant tumors get and non-malignant ones get , and the green line is our hypothesis . To make predictions we may say that for any given tumor size , if gets bigger than we predict malignant tumor, otherwise we predict benign.

Looks like this way we could correctly predict every single training set sample, but now let's change the task a bit.

Intuitively it's clear that all tumors larger certain threshold are malignant. So let's add another sample with a huge tumor size, and run linear regression again:

Now our $h(x) > 0.5 \rightarrow malignant$ doesn't work anymore. To keep making correct predictions we need to change it to or something - but that not how the algorithm should work.

We cannot change the hypothesis each time a new sample arrives. Instead, we should learn it off the training set data, and then (using the hypothesis we've learned) make correct predictions for the data we haven't seen before.

Hope this explains why linear regression is not the best fit for classification problems! Also, you might want to watch VI. Logistic Regression. Classification video on ml-class.org which explains the idea in more detail.


EDIT

probabilityislogic asked what a good classifier would do. In this particular example you would probably use logistic regression which might learn a hypothesis like this (I'm just making this up):

Note that both linear regression and logistic regression give you a straight line (or a higher order polynomial) but those lines have different meaning:

  • for linear regression interpolates, or extrapolates, the output and predicts the value for we haven't seen. It's simply like plugging a new and getting a raw number, and is more suitable for tasks like predicting, say car price based on {car size, car age} etc.
  • for logistic regression tells you the probability that belongs to the "positive" class. This is why it is called a regression algorithm - it estimates a continuous quantity, the probability. However, if you set a threshold on the probability, such as , you obtain a classifier, and in many cases this is what is done with the output from a logistic regression model. This is equivalent to putting a line on the plot: all points sitting above the classifier line belong to one class while the points below belong to the other class.

So, the bottom line is that in classification scenario we use a completely different reasoning and a completely different algorithm than in regression scenario.

Answer from andreister on Stack Exchange
🌐
Towards Data Science
towardsdatascience.com › home › latest › regression for classification | hands on experience
Regression for Classification | Hands on Experience | Towards Data Science
January 23, 2025 - Fundamentally, classification is about predicting a label and regression is about predicting a quantity. Why linear regression can’t use for classification? The main reason for that is the predicted values are continuous, not probabilistic.
🌐
Turing
turing.com › kb › scikit-learn-cheatsheet-methods-for-classification-and-regression
Scikit-Learn Cheatsheet: Methods For Classification and Regression
Common examples of regression tasks include stock market price prediction, estimation of regional sales for various products in a factory, demand prediction for a particular item based on past sales records, and so on. Classification is where we train a model to classify data into well-defined categories, based on previous data labels.
Discussions

Regression to Solve Classification problem: Good or Rubbish?
Have you tried logistic regression? More on reddit.com
🌐 r/datascience
11
18
October 26, 2022
Why doesn’t linear regression work on classification problems?
So think of it this way: I ask you if something is a dog or cat. This is a classification problem. Linear regression returns a continuous value. So 0, 0.5, 0.2. I ask you to tell me if something is a dog or cat. You reply 0.2. this makes no sense. This is what you're doing when you apply linear regression to classification. What you can do is interpret these values as probabilities. And then designate 0 as absolute cat and 1 as absolute dog. Then anything greater than 0.5 is a do, anything less is a cat. To do this we apply a function to 'squash' all continuous outputs of the linear regression to simply be better 0 and 1. We do this with a logistic function. You know have logistic regression. More on reddit.com
🌐 r/learnmachinelearning
29
20
November 27, 2023
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-classification-vs-regression
Classification vs Regression in Machine Learning - GeeksforGeeks
November 27, 2025 - Classification predicts categories or labels like spam/not spam, disease/no disease, etc. Regression predicts continuous values like price, temperature, sales, etc.
🌐
Upgrad
upgrad.com › home › blog › artificial intelligence › regression vs classification in machine learning: difference between regression and classification
Regression Vs Classification in Machine Learning: Difference Between Regression and Classification
November 24, 2025 - But the significant difference and regression approaches are that while in a regression, the output variable ‘y’ is numeric and continuous (can be an integer or floating-point values), in the classification algorithm, the output variable ‘y’ is discrete and categorical. So, if you are predicting variables such as salary, life expectancy, churn probability – then these variables will be numeric and continuous. For example, suppose that a financial institution is interested in profiling its loan applicants in order to gauge the likelihood of their default.
Top answer
1 of 4
87

"..approach classification problem through regression.." by "regression" I will assume you mean linear regression, and I will compare this approach to the "classification" approach of fitting a logistic regression model.

Before we do this, it is important to clarify the distinction between regression and classification models. Regression models predict a continuous variable, such as rainfall amount or sunlight intensity. They can also predict probabilities, such as the probability that an image contains a cat. A probability-predicting regression model can be used as part of a classifier by imposing a decision rule - for example, if the probability is 50% or more, decide it's a cat.

Logistic regression predicts probabilities, and is therefore a regression algorithm. However, it is commonly described as a classification method in the machine learning literature, because it can be (and is often) used to make classifiers. There are also "true" classification algorithms, such as SVM, which only predict an outcome and do not provide a probability. We won't discuss this kind of algorithm here.

Linear vs. Logistic Regression on Classification Problems

As Andrew Ng explains it, with linear regression you fit a polynomial through the data - say, like on the example below we're fitting a straight line through {tumor size, tumor type} sample set:

Above, malignant tumors get and non-malignant ones get , and the green line is our hypothesis . To make predictions we may say that for any given tumor size , if gets bigger than we predict malignant tumor, otherwise we predict benign.

Looks like this way we could correctly predict every single training set sample, but now let's change the task a bit.

Intuitively it's clear that all tumors larger certain threshold are malignant. So let's add another sample with a huge tumor size, and run linear regression again:

Now our $h(x) > 0.5 \rightarrow malignant$ doesn't work anymore. To keep making correct predictions we need to change it to or something - but that not how the algorithm should work.

We cannot change the hypothesis each time a new sample arrives. Instead, we should learn it off the training set data, and then (using the hypothesis we've learned) make correct predictions for the data we haven't seen before.

Hope this explains why linear regression is not the best fit for classification problems! Also, you might want to watch VI. Logistic Regression. Classification video on ml-class.org which explains the idea in more detail.


EDIT

probabilityislogic asked what a good classifier would do. In this particular example you would probably use logistic regression which might learn a hypothesis like this (I'm just making this up):

Note that both linear regression and logistic regression give you a straight line (or a higher order polynomial) but those lines have different meaning:

  • for linear regression interpolates, or extrapolates, the output and predicts the value for we haven't seen. It's simply like plugging a new and getting a raw number, and is more suitable for tasks like predicting, say car price based on {car size, car age} etc.
  • for logistic regression tells you the probability that belongs to the "positive" class. This is why it is called a regression algorithm - it estimates a continuous quantity, the probability. However, if you set a threshold on the probability, such as , you obtain a classifier, and in many cases this is what is done with the output from a logistic regression model. This is equivalent to putting a line on the plot: all points sitting above the classifier line belong to one class while the points below belong to the other class.

So, the bottom line is that in classification scenario we use a completely different reasoning and a completely different algorithm than in regression scenario.

2 of 4
19

I can't think of an example in which classification is actually the ultimate goal. Almost always the real goal is to make accurate predictions, e.g., of probabilities. In that spirit, (logistic) regression is your friend.

🌐
IBM
ibm.com › think › topics › classification-vs-regression
Classification vs Regression | IBM
November 17, 2025 - But their goals differ: regression models predict continuous values (like house prices or patient blood pressure), while classification models predict discrete categories (such as whether an email is spam or not, or whether a tumor is malignant ...
Find elsewhere
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-linear-regression
Linear Regression in Machine learning - GeeksforGeeks
The goal of linear regression is to find a straight line that minimizes the error (the difference) between the observed data points and the predicted values. This line helps us predict the dependent variable for new, unseen data.
Published   3 weeks ago
🌐
Medium
muttinenisairohith.medium.com › regression-classification-and-clustering-understanding-core-machine-learning-concepts-8a546bfc1a96
Regression, Classification, and Clustering: Understanding Core Machine Learning Concepts | by Muttineni Sai Rohith | Medium
March 12, 2025 - Regression → Used for predicting continuous values (e.g., house prices, stock trends). Classification → Assigns predefined labels to data (e.g., spam detection, medical diagnosis).
🌐
UCI Machine Learning Repository
archive.ics.uci.edu › datasets
UCI Machine Learning Repository
Classification, Regression · Multivariate · 649 Instances · 33 Features · This is a transactional data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. Classification, Clustering ·
🌐
Infor
infor.com › infor blog
Picking Algorithms: Regression vs. Classification
January 12, 2022 - Using pedal length, pedal width, sepal length, and sepal width, the classification algorithms will attempt to determine what species of Iris (which category, or class) the flower belongs. As with regression, labeled data will first need to train the algorithms using known flower types before it can predict flower type based on a given set of features.
🌐
Medium
medium.com › @mangeshsalunke1309 › regression-vs-classification-in-machine-learning-35859262eabd
Regression vs Classification in Machine Learning | by Mangesh Salunke | Medium
June 17, 2025 - Regression predicts continuous values, useful in fields like finance and healthcare where precise numbers are needed. Classification, on the other hand, assigns data to categories, making it essential for tasks like spam detection and medical ...
🌐
Reddit
reddit.com › r/datascience › regression to solve classification problem: good or rubbish?
r/datascience on Reddit: Regression to Solve Classification problem: Good or Rubbish?
October 26, 2022 -

Hello DSs!

Thought I’d share this here and listen to different opinions. So I have/had a classification problem and of course I had tried most of the classification algorithms (from Random forest to AdaBoot, etc). I wanted to try XGBoost when I had the idea to treat it as a regression problem. So I fit the dataset with the binary classes using the XGB regressor. I then predicted the test set on it. Of course l, the output was continuous (i.e. 0.11, 0.2 etc). Of course, this can’t be passed into the classification evaluation metrics.

I then set a normal benchmark (<0.5:0, >=0.5:1). After running the evaluation metrics on the new result, I was surprised to see an accuracy of 0.83 and even high precision and recall for the unrepresented class. Has anyone else tried this? Do you guys think it’s rubbish?

🌐
DeepLearning.AI
deeplearning.ai › home › courses › machine learning specialization
Machine Learning Specialization - DeepLearning.AI
April 11, 2022 - Learn the difference between supervised and unsupervised learning and regression and classification tasks. Build a linear regression model. Implement and understand the purpose of a cost function. Implement and understand how gradient descent is used to train a machine learning model. Build and train a regression model that takes multiple features as input (multiple linear regression). Implement and understand the cost function and gradient descent for multiple linear regression.
🌐
Reddit
reddit.com › r/learnmachinelearning › why doesn’t linear regression work on classification problems?
r/learnmachinelearning on Reddit: Why doesn’t linear regression work on classification problems?
November 27, 2023 -

Hello. Im a beginner in ML. I’m trying to really understand why linear regression doesn’t work on classification problems.

I often the answers along the lines of: “it predicts continuous values” or “finds the best fit lines” or something similar.

This is quite difficult for me to intuitively grasp and I’ve been stuck trying to figure this out for more than 3 weeks now.

I’m working on a titanic dataset and trying to use linear regression but I do not even know how to make it work with linear regression.

I understand that’s not what LR is meant for but I just I want to really see and understand why that’s so.

If possible explain like a total newb. No complex or tacit language

Top answer
1 of 10
44
So think of it this way: I ask you if something is a dog or cat. This is a classification problem. Linear regression returns a continuous value. So 0, 0.5, 0.2. I ask you to tell me if something is a dog or cat. You reply 0.2. this makes no sense. This is what you're doing when you apply linear regression to classification. What you can do is interpret these values as probabilities. And then designate 0 as absolute cat and 1 as absolute dog. Then anything greater than 0.5 is a do, anything less is a cat. To do this we apply a function to 'squash' all continuous outputs of the linear regression to simply be better 0 and 1. We do this with a logistic function. You know have logistic regression.
2 of 10
16
Well, technically, the most popular classification approach logistic regression is linear regression. Or more technically, vector-valued linear regression, except at the end you would apply a transform (sigmoid for binary, and softmax for n-ary classification) that converts the raw values to "probabilities" that sum to 1 across all the target classes. This last requirement is the only thing that distinguishes the two approaches. You can in fact apply the mean-squared error directly before the probability transform too (just like linear regression), and it is called the Brier loss. For more modern problems with neural networks, it has been noticed that Brier loss can often match the performance of more standard softmax classification, but converges much slower. Therefore, it hasn't found much practical value.
🌐
Medium
medium.com › enjoy-algorithm › classification-and-regression-problems-in-machine-learning-83b8fc9ab958
Classification and Regression Problems in Machine Learning | EnjoyAlgorithms
November 25, 2021 - Classification and Regression deal with the problem of mapping a function from input to output. In classification, the output is a discrete but in regression, the output is continuous.
🌐
Nixus
nixustechnologies.com › home › classification vs regression in machine learning
Classification vs Regression in Machine Learning - Nixus
March 27, 2023 - The above figures represents two ... it accordingly. Regression algorithms are used to find the correlations between the data i.e dependent variables and independent variables....
🌐
Udacity
udacity.com › blog › 2025 › 02 › regression-vs-classification-key-differences-and-when-to-use-each.html
Regression vs Classification - Key Differences and When to Use Each | Udacity
February 27, 2025 - Forecasting stock values: Predicting stock prices is a complex task, but regression can be used to model the relationship between historical stock data, market trends, economic indicators, and other relevant factors. While not perfectly accurate, these models can provide insights into potential future stock values. ... Classification is a core machine learning task that focuses on assigning data points to predefined categories or classes.
🌐
TensorFlow
playground.tensorflow.org
Tensorflow — Neural Network Playground
Classification · Regression · Which dataset do you want to use? Ratio of training to test data: XX% Noise: XX · Batch size: XX · Regenerate · Which properties do you want to feed in? Click anywhere to edit. Weight/Bias is 0.2. This is the output from one neuron.
🌐
scikit-learn
scikit-learn.org › stable › modules › linear_model.html
1.1. Linear Models — scikit-learn 1.8.0 documentation
This classifier first converts binary targets to {-1, 1} and then treats the problem as a regression task, optimizing the same objective as above. The predicted class corresponds to the sign of the regressor’s prediction. For multiclass classification, the problem is treated as multi-output regression, and the predicted class corresponds to the output with the highest value.
🌐
Coursera
coursera.org › browse › data science › machine learning
Supervised Machine Learning: Regression and Classification | Coursera
January 28, 2025 - Build & train supervised machine learning models for prediction & binary classification tasks, including linear regression & logistic regression
Rating: 4.9 ​ - ​ 32.3K votes