How do I determine the difference between regression and classification in machine learning?
Regression or classification? - Warm Up: Machine Learning with a Heart - DrivenData Community
terminology - Regression vs. Classification: Is there a clear, generally accepted definition? - Cross Validated
Just curious: Classification Vs Regression task which is easier to accomplish using DL models?
Videos
I am currently doing a course by andrew Ng. And I don't understand the difference between regression and classification. So I looked at the notes and it said:
In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.
and I don't even understand what they mean by a continuous function in regression.
Classification denotes an action. It's what you do with the result of an analysis in which there is one or more outcome variables and one or more input (predictor; covariate) variables. If there is a single outcome variable, the discreteness of the variable does not matter. For example, binary logistic regression is for binary Y and is a direct (continuous) probability model that was not intended to be used for classification. The action of classification involves making choices and use of decision rules. In most cases it represents a premature decision made by an analyst who is not blessed with knowledge about the consequence of the decision (i.e., does not possess the utility/loss/cost function needed to make a good decision).
One can use any predictive method to do classification even if that was not the intent of the method. For example one can use arbitrary thresholds on predicted values to do classification from ordinary regression for continuous Y, or ordinal or binary regression for ordered or binary Y.
Many in machine learning think of classification as a good default mode; it is not, as detailed in my blog post. Among other things, classification hides close calls and lulls users into making decisions at the boundaries (e.g., when a predicted probability is 0.5001) when a better approach would be "get more data first".
Most of the time when you see classifier used in a sentence the correct term is prediction when the output is considered to be continuous.
No, I don't think that definition is generally accepted. I would not regard Poisson regression as classification as the thing you are generally interested is the conditional values of a Poisson distribution that describes the distribution of the target variable for those values of the attributes. Those parameters are generally continuous. You might then use that to work out the most likely count, but that would be discretising the predictive distribution given by the model.
Likewise some here (e.g. Frank Harrell - see his answer to this question +1) view logistic regression purely as a probabilistic model, used to estimate a conditional probability, and not as a classification model (which is what you get by applying a threshold and discretising the continuous output of the model). I have a lot of sympathy with this view, except that in practical applications where you need to perform that discretisation, that still impacts on the design and evaluation of the model and shouldn't be ignored. The optimal classification is not always obtained by estimating the probability of class membership and thresholding, sometimes it is better to classify the data directly. If that were not the case, [kernel] logistic regression would not perform worse than the Support Vector Machine, but on some applications it clearly does.
I'd probably say that a classifier is a problem where the target distribution is categorical, and the aim to to place each object into a category.