class of statistical models
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable … Wikipedia
🌐
Wikipedia
en.wikipedia.org β€Ί wiki β€Ί Generalized_linear_model
Generalized linear model - Wikipedia
2 weeks ago - The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
🌐
Quora
quora.com β€Ί What-is-the-difference-between-linear-regression-and-generalized-linear-regression
What is the difference between 'linear regression' and 'generalized linear regression'? - Quora
GLM is a superset: linear regression is the GLM with Gaussian family and identity link. For overdispersion, zero inflation, correlated data, or non-exponential-family errors, use extensions: quasi-likelihood, negative binomial, zero-inflated ...
Discussions

Test to know when to use GLM over Linear Regression? - Cross Validated
Generalized Linear Models (GLMs) are more general than Linear Regression by construction. Nearly the same question was asked here: When to use GLM instead of LM?. However I'm not very satisfied of ... More on stats.stackexchange.com
🌐 stats.stackexchange.com
September 14, 2019
Linear model vs Generalised linear model vs Generalised mixed effect model - big confusion
linear models can be used when the residuals follow a normal distribution and Generalised linear models when residuals do not follow normal distribution. Is this correct? Yes this is correct. I wasn't told anything about this during my courses on linear models. How do I run this in R? You can do a linear model in R by doing the following your_model = lm(YOUR.RESPONSE.VARIABLE ~ YOUR.PREDICTOR.VARIABLE, data = YOUR.DATA) Then run the following to get the output summary(your_model) 2) Mixed effect models are used when you have repeatd measurements for the same subjects. But I am not sure about this. Yeah this is correct you fit a random effect which takes into account your repeated measures. People typically use the lme4 package for this. the function call is lmer instead of lm. Does anyone have something like this to recommend? Here's a helpful guide: https://www.danielnettle.org.uk/wp-content/uploads/2019/07/funwithR3.0.pdf I have found this course pdf very helpful specifically sections 2, and 4 I think would help you greatly. More on reddit.com
🌐 r/rstats
11
4
March 18, 2021
What is the difference between the general linear model (GLM)and generalized linear model (GZLM)?
The general linear model requires ... linear model is an extension of the general linear model that allows the specification of models whose response variable follows different distributions. For example logistic regression (where the dependent variable is categorical) ... More on researchgate.net
🌐 researchgate.net
19
6
May 22, 2014
modeling - General Linear Model vs. Generalized Linear Model (with an identity link function?) - Cross Validated
$\begingroup$ Typically one uses a fixed scale only with models like logistic regression or Poisson regression, where the response is a count or indicator/frequency variable. In this case there is no analogue to the scale parameter in normal regression. $\endgroup$ ... Find the answer to your question by asking. Ask question ... See similar questions with these tags. 2 Difference between general linear ... More on stats.stackexchange.com
🌐 stats.stackexchange.com
🌐
Wikipedia
en.wikipedia.org β€Ί wiki β€Ί General_linear_model
General linear model - Wikipedia
2 weeks ago - The general linear model (GLM) ... framework, both the t-test and the F-test can be applied. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable....
Top answer
1 of 3
17

As with many other cases in statistics, the goal of finding a single test to replace one's judgement is a bad one.

There are several sources of information you can and should use while deciding: the theoretical expectation of the distribution, prior empirical work on the topic, the properties of the data (e.g. is it truncated or zero-inflated?), and the residual distributions and other diagnostics after fitting models. But there is no single, general test (or even a set of tests) that will tell you what to do.

And there cannot be one. I recognise the intuitive appeal of having a decision tree to follow when making such a choice, especially in an area that is complex and new to you. But there are few hard boundaries in the areas you need to consider, and so this decision does not lend itself well to such a workflow. You need to use judgement, and developing that will take time and practice.

2 of 3
14

Another great answer from @mkt on this forum. Here are a few more pointers you might find useful.

GLMs include some widely used types of regression models:

  1. Binary Logistic Regression Models;
  2. Binomial Logistic Regression Models;
  3. Multinomial Logistic Regression Models;
  4. Ordinal Logistic Regression Models;
  5. Poisson Regression Models;
  6. Beta Regression Models;
  7. Gamma Regression Models.

As pointed out by @COOLSerdash in his comment, beta regression models share some features - such as linear predictor, link function, dispersion parameter - with GLMs (GLMs; McCullagh and Nelder 1989), but are NOT special cases of the GLM framework. However, I included them in the above list because of their similarity with GLMs and their practical value.

A good place to start would be to familiarize yourself with each of these types of models and when it might be used.

Binary Logistic Regression Models

These types of models are used to model the relationship between a binary dependent variable Y and a set of independent variables X1, ..., Xp.

For example, Y could represent the survival status of patients at a local hospital assessed 30 days following a surgical intervention for treating a particular disease such that Y = 1 for a patient who survived and Y = 0 for a patient who died. Furthermore, if p = 2, then X1 could represent Age (expressed in years) and X2 could represent gender. For all the subsequent examples below, it will be assumed that p = 2 and that X1 and X2 will have the same meaning as in the current example.

Binomial Logistic Regression Models

These types of models are used to model the relationship between a binomial dependent variable Y and a set of independent variables X1, ..., Xp.

For example, Y could represent the number of correct questions (out of 10) answered by patients on a questionnaire eliciting their knowledge of the symptoms associated with their disease.

Multinomial Logistic Regression Models

These types of models are used to model the relationship between a nominal dependent variable Y with more than 2 categories and a set of independent variables X1, ..., Xp.

Ordinal Logistic Regression Models

These types of models are used to model the relationship between an ordinal dependent variable Y and a set of independent variables X1, ..., Xp.

For example, Y could represent the degree of pain experienced by patients immediately after surgery, expressed on an ordinal scale from 1 to 5, where 1 stands for no pain and 5 stands for severe pain.

Poisson Regression Models

These types of models are used to model the relationship between a count dependent variable Y and a set of independent variables X1, ..., Xp.

For example, Y could represent the number of hospital days (out of 30) when patients had to use pain relieving medication following their surgery.

Beta Regression Models

These types of models are used to model the relationship between a dependent variable Y expressed as a continuous proportion taking values in the open interval (0,1) and a set of independent variables X1, ..., Xp.

For example, if the disease in question is a brain disease, Y could represent the fraction of the brain area still affected by disease 30 days post-surgery relative to the total brain area for patients who survived the surgery.

Gamma Regression Models

These types of models are used to model the relationship between a positive-valued, continuous dependent variable Y and a set of independent variables X1, ..., Xp.

For example, Y could represent the healthcare utilization costs of patients who survived up to the 30-day mark.

🌐
VitalFlux
vitalflux.com β€Ί home β€Ί machine learning β€Ί glm vs linear regression: difference, examples
GLM vs Linear Regression: Difference, Examples
December 7, 2023 - GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function. The link function is a function of the expected value of the response variable.
🌐
Great Learning
mygreatlearning.com β€Ί blog β€Ί data science and analytics β€Ί generalized linear model | what does it mean?
Generalized Linear Model | What does it mean?
October 14, 2024 - Unlike Linear Regression models, the error distribution of the response variable need not be normally distributed. The errors in the response variable are assumed to follow an exponential family of distribution (i.e. normal, binomial, Poisson, ...
🌐
Reddit
reddit.com β€Ί r/rstats β€Ί linear model vs generalised linear model vs generalised mixed effect model - big confusion
r/rstats on Reddit: Linear model vs Generalised linear model vs Generalised mixed effect model - big confusion
March 18, 2021 -

Hi I am a PhD student and have been studying statistics with R for some month. I have done courses for basics statistics (linear and logistic regression, anova, etc) and now would like to go deeper.

For one study I am conducting I have been suggested to look at generalised and logistic mixed effect models. I have looked online and found a lot of info, but now I am very confused!

So far, I understood the following

  1. linear models can be used when the residuals follow a normal distribution and Generalised linear models when residuals do not follow normal distribution. Is this correct? I wasn't told anything about this during my courses on linear models. How do I run this in R?

  2. Mixed effect models are used when you have repeatd measurements for the same subjects. But I am not sure about this. Additionally, I am not sure how to run this analysis in R.

I have found lots of books on these topics, but they go very deep and include lots of maths and formulaes. What I would like to have is a book/course that explain when it is best using each model, how to run the analysis and which assumptions to check- without going into formulae and other technical stuff.

Does anyone have something like this to recommend? If not, if someone could explains a bit and maybe give some example of analysis would be great

Thanks!

Top answer
1 of 4
9
linear models can be used when the residuals follow a normal distribution and Generalised linear models when residuals do not follow normal distribution. Is this correct? Yes this is correct. I wasn't told anything about this during my courses on linear models. How do I run this in R? You can do a linear model in R by doing the following your_model = lm(YOUR.RESPONSE.VARIABLE ~ YOUR.PREDICTOR.VARIABLE, data = YOUR.DATA) Then run the following to get the output summary(your_model) 2) Mixed effect models are used when you have repeatd measurements for the same subjects. But I am not sure about this. Yeah this is correct you fit a random effect which takes into account your repeated measures. People typically use the lme4 package for this. the function call is lmer instead of lm. Does anyone have something like this to recommend? Here's a helpful guide: https://www.danielnettle.org.uk/wp-content/uploads/2019/07/funwithR3.0.pdf I have found this course pdf very helpful specifically sections 2, and 4 I think would help you greatly.
2 of 4
9
So far, I understood the following 1) linear models can be used when the residuals follow a normal distribution and Generalised linear models when residuals do not follow normal distribution. Is this correct? I wasn't told anything about this during my courses on linear models. How do I run this in R? Linear models, in general, refer to ordinary least squares regression that have residuals that follow a normal (Gaussian) distribution, yes. This can be done with lm() in R Generalised linear models can be done with data with residuals that have a normal distribution or other families of error distributions with the proper linking function. Linear models can be done with glm(family = gaussian) and other types can be done with glm(family = binomial, poisson, etc). 2) Mixed effect models are used when you have repeatd measurements for the same subjects. But I am not sure about this. Additionally, I am not sure how to run this analysis in R. Mixed effects can be used for repeated measures data, but also for any data which are clustered or have non-independent errors. This can be done with lme4::lmer() or nlme::lme(). They are called mixed effects because they have both fixed effects and random effects in the model.
Find elsewhere
Top answer
1 of 16
50
The general linear model requires that the response variable follows the normal distribution whilst the generalized linear model is an extension of the general linear model that allows the specification of models whose response variable follows different distributions. For example logistic regression (where the dependent variable is categorical) or poisson regression (where the dependent variable is a count variable) are both generalized linear models. In addition, the response variable is related to the linear model through a link function. In the case of the linear model that would be the identity ( the "=" part of the equation) For the generalized linear model different link functions can be used that would denote a different relationship between the linear model and the response variable (e.g. inverse, logit, log, etc). For example for the poisson regression, the link function is the "log". (You can think of the link function as a transformation of the response variable).
2 of 16
35
I suggest a small modification to what George said. I.e., the general linear model assumes that the ~errors~ are normally distributed, or equivalently that the response variable is normally distributed ~conditional~ on the linear combination of explanatory variables. If you look at textbooks or articles on the generalized linear model, the authors will almost certainly talk about the distinction in terms of the link function and error distribution. E.g., OLS linear regression is a generalized linear model with an identity link function and normally distributed errors. Binary logistic regression, on the other hand, is a generalized linear model with a logit link function and a binomial error distribution (because the outcome variable has only two possible values). HTH.
🌐
The Analysis Factor
theanalysisfactor.com β€Ί home β€Ί spss glm or regression? when to use each
SPSS GLM or Regression? When to use each
August 9, 2023 - Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions.
🌐
Medium
anyi-guo.medium.com β€Ί linear-regression-vs-generalized-linear-models-glm-whats-the-difference-a6bf78d2c968
Linear regression vs. Generalized linear models (GLM): What’s the difference? | by Anyi Guo | Medium
June 14, 2022 - Linear regression vs. Generalized linear models (GLM): What’s the difference? Linear Regression Definition Linear Regression is a modelling approach that assumes a linear relationship between an …
🌐
TIBCO
docs.tibco.com β€Ί data-science β€Ί GUID-C5AFA8DB-F74E-4838-B116-6416FB1BAD8C.html
Generalized Linear Model (GLM) Introductory Overview - Extension of Multiple Regression to the General Linear Model
This extension gives the general linear model important advantages over the multiple and the so-called multivariate regression models, both of which are inherently univariate (single dependent variable) methods. One advantage is that multivariate tests of significance can be employed when responses on multiple dependent variables are correlated.
Top answer
1 of 2
24

A generalized linear model specifying an identity link function and a normal family distribution is exactly equivalent to a (general) linear model. If you're getting noticeably different results from each, you're doing something wrong.

Note that specifying an identity link is not the same thing as specifying a normal distribution. The distribution and the link function are two different components of the generalized linear model, and each can be chosen independently of the other (although certain links work better with certain distributions, so most software packages specify the choice of links allowed for each distribution).

Some software packages may report noticeably different $p$-values when the residual degrees of freedom are small if it calculates these using the asymptotic normal and chi-square distributions for all generalized linear models. All software will report $p$-values based on Student's $t$- and Fisher's $F$-distributions for general linear models, as these are more accurate for small residual degrees of freedom as they do not rely on asymptotics. Student's $t$- and Fisher's $F$-distributions are strictly valid for the normal family only, although some other software for generalized linear models may also use these as approximations when fitting other families with a scale parameter that is estimated from the data.

2 of 2
5

I would like to include my experience in this discussion. I have seen that a generalized linear model (specifying an identity link function and a normal family distribution) is identical to a general linear model only when you use the maximum likelihood estimate as scale parameter method. Otherwise if "fixed value = 1" is chosen as scale parameter method you get very different p values. My experience suggest that usually "fixed value = 1" should be avoided. I'm curious to know if someone knows when it is appropriate to choose fixed value = 1 as scale parameter method. Thanks in advance. Mark

🌐
Reddit
reddit.com β€Ί r/rstats β€Ί do you consider generalized linear models to be linear ?
r/rstats on Reddit: Do you consider Generalized linear models to be linear ?
May 29, 2021 -

I recently was having a debate with a Data Scientist (with little statistical training) about GLMs. He believes that GLMs (such as logistic regression) are linear. I have some statistical training and as far I have heard many of my peers don't consider GLMs to be linear.

I started probing further to substantiate my claim and came across these quotes in an Quora answer.

Link to the post -> https://www.quora.com/Why-is-logistic-regression-considered-a-linear-model

For the benefit of others, this answer is at odds with what statisticians have meant by "linear model" ever since the term "generalized linear model" was introduced. The answer a statistician would give to this question is "logistic regression *is not* a linear model. "A statistician calls a model "linear" if the mean of the response is a linear function of the parameter, and this is clearly violated for logistic regression. Logistic regression is a *generalized linear model*. Generalized linear models are, despite their name, not generally considered linear models. They have a linear component, but the model itself is nonlinear due to the nonlinearity introduced by the link function.

I think this group has a significant number of statisticians. Hence I wanted to ask you, Do you guys consider GLMs to be linear ? Do you agree with the quoted text above ?

🌐
Stack Exchange
math.stackexchange.com β€Ί questions β€Ί 4527970 β€Ί whats-the-difference-between-a-general-linear-model-and-a-generalized-linear-mo
definition - What's the difference between a general linear model and a generalized linear model? - Mathematics Stack Exchange
September 9, 2022 - A generalized linear model (GLM) is an extension of the usual linear model, both simple (one input) and multiple (multiple inputs), where we expect the residuals to follow a distribution not normal, but of any function in the exponential family. Therefore, there can be models that are only GLM (one output, non normal error), only general (multiple output, normal error), or both (multiple output, non normal error). Moreover, all of the above may have one single input (simple regression), or multiple ones (multivariate/multiple inputs).
🌐
ListenData
listendata.com β€Ί home β€Ί statistics
Modeling Myth : General linear model and generalized linear model mean the same thing
The Generalized Linear Model is a generalization of the general linear model. In general linear model, a dependent variable must be linearly associated with values on the independent variables.
🌐
Interpretable Machine Learning
christophm.github.io β€Ί interpretable-ml-book β€Ί extend-lm.html
8 GLM, GAM and more – Interpretable Machine Learning
This assumption excludes many cases: ... or a very skewed outcome with a few very high values (household income). The linear regression model can be extended to model all these types of outcomes. This extension is called Generalized Linear Models or GLMs for short...
🌐
Medium
sid-sharma1990.medium.com β€Ί general-and-generalized-linear-models-30c8f52ecb8d
General and Generalized Linear Models | Medium
May 18, 2022 - The general linear model requires that the dependent variable follows the normal distribution whilst the generalized linear model is an extension of the general linear model that allows the specification of models whose response variable follows ...
Top answer
1 of 2
3

A general linear model doesn't generalize the function of $X$.

Indeed assuming you mean $E(Y|X)$ where you have $Y$ (and independent errors) whether or not there's a transformed predictor doesn't change things -- either way it would still be called a linear model (the conditional mean is a linear function of the parameters).

That is to say, consider $\alpha+\beta \psi(X)$. Now let $X^* = \psi(X)$. Then in terms of this new variable (the one used in the estimation) we have $\alpha+\beta X^*$. So either a linear model or a general linear model will be able to incorporate a transformation, $\psi$, (of the independent variable or variables) without difficulty.

Instead, with a multivariate response (each observation point is a vector of values), a general linear model generalizes the covariance structure of the error term so that the response values includes the possibility of correlated errors within the observation vector (that is, the components of $\mathbf{y}_i$ are correlated).

This feature allows us to place under one banner t-tests, ANOVA, regression, MANOVA, MANCOVA, multivariate regression and a number of other models/tools (while the multivariate techniques wouldn't necessarily be seen as covered by the term 'linear model', though the usage does vary).

[Between observations there is still independence; if you want instead to generalize to correlated errors between-observations, that would be generalized least squares as fcop pointed out in comments.]

2 of 2
0

In a linear model, we define prediction or regression function using a linear structure as follows: $y\approx E(y|x)=\omega_0 + \omega^\top x.$

While in a generalized linear model, we define prediction function or discriminatory function either as a linear in parameter or a non-linear in parameter through linear argument ($\omega^\top x +\omega_0$).

That is the hypothesis function for generalized linear model is $h(x)=g(\omega^\top x +\omega_0)$, where g may be a linear or non-linear function (known as activation function). While estimating the hypothesis function, we focus on estimating the parameter $\omega$ only as g is defined as per requirment.