value that appears most often in a set of data

In statistics, the mode is the value that appears most often in a set of data values. If X is a discrete random variable, the mode is the value x at which … Wikipedia
🌐
Wikipedia
en.wikipedia.org β€Ί wiki β€Ί Mode_(statistics)
Mode (statistics) - Wikipedia
January 26, 2005 - Like the statistical mean and median, the mode is a summary statistic about the central tendency of a random variable or a population. The numerical value of the mode is the same as that of the mean and median in a normal distribution, but it may be very different in highly skewed distributions.
🌐
Statistics Canada
www150.statcan.gc.ca β€Ί n1 β€Ί edu β€Ί power-pouvoir β€Ί ch11 β€Ί mode β€Ί 5214873-eng.htm
4.4.3 Calculating the mode
The mode can’t be used a measure of central tendency because there is more than one mode. It’s a bimodal distribution.
🌐
CK-12 Foundation
ck12.org β€Ί all subjects β€Ί cbse math β€Ί mean deviation β€Ί what is the mode of a distribution?
Flexi answers - What is the mode of a distribution? | CK-12 Foundation
September 11, 2025 - In statistics, the mode is a measure of central tendency that refers to the value or values in a data set that appear most frequently. In other words, it is the value that occurs the highest number of times. For example, consider the following set of numbers: @$\begin{align*}1, 2, 2, 3, ...
🌐
Statistics4u
statistics4u.com β€Ί fundstat_eng β€Ί cc_mode.html
Mode of a Distribution
The mode is the value which occurs most frequently in a data set. The mode is only of interest for large data sets, as in small samples the mode strongly depends on random variations of the data. The term "mode" is contained in the expressions "unimodal", "bimodal", and "multimodal".
Top answer
1 of 4
18

Saying "the mode" implies that the distribution has one and only one. In general a distribution may have many modes, or (arguably) none.

If there's more than one mode you need to specify if you want all of them or just the global mode (if there is exactly one).

Assuming we restrict ourselves to unimodal distributions*, so we can speak of "the" mode, they're found in the same way as finding maxima of functions more generally.

*note that page says "as the term "mode" has multiple meanings, so does the term "unimodal"" and offers several definitions of mode -- which can change what, exactly, counts as a mode, whether there is 0 1 or more -- and also alters the strategy for identifying them. Note particularly how general the "more general" phrasing of what unimodality is in the opening paragraph "unimodality means there is only a single highest value, somehow defined"

One definition offered on that page is:

A mode of a continuous probability distribution is a value at which the probability density function (pdf) attains its maximum value

So given a specific definition of the mode you find it as you would find that particular definition of "highest value" when dealing with functions more generally, (assuming that the distribution is unimodal under that definition).

There are a variety of strategies in mathematics for identifying such things, depending on circumstances. See, the "Finding functional maxima and minima" section of the Wikipedia page on Maxima and minima which gives a brief discussion.

For example, if things are sufficiently nice -- say we're dealing with a continuous random variable, where the density function has continuous first derivative -- you might proceed by trying to find where the derivative of the density function is zero, and checking which type of critical point it is (maximum, minimum, horizontal point of inflexion). If there's exactly one such point which is a local maximum, it should be the mode of a unimodal distribution.

However, in general things are more complicated (e.g. the mode may not be a critical point), and the broader strategies for finding maxima of functions come in.

Sometimes, finding where derivatives are zero algebraically may be difficult or at least cumbersome, but it may still be possible to identify maxima in other ways. For example, it may be that one might invoke symmetry considerations in identifying the mode of a unimodal distribution. Or one might invoke some form of numerical algorithm on a computer, to find a mode numerically.

Here are some cases that illustrate typical things that you need to check for - even when the function is unimodal and at least piecewise continuous.

So, for example, we must check endpoints (center diagram), points where the derivative changes sign (but may not be zero; first diagram), and points of discontinuity (third diagram).

In some cases, things may not be so neat as these three; you have to try to understand the characteristics of the particular function you're dealing with.


I haven't touched on the multivariate case, where even when functions are quite "nice", just finding local maxima may be substantially more complex (e.g. the numerical methods for doing so can fail in a practical sense, even when they logically must succeed eventually).

2 of 4
25

This answer focuses entirely on mode estimation from a sample, with emphasis on one particular method. If there is any strong sense in which you already know the density, analytically or numerically, then the preferred answer is, in brief, to look for the single maximum or multiple maxima directly, as in the answer from @Glen_b.

"Half-sample modes" may be calculated using recursive selection of the half-sample with the shortest length. Although it has longer roots, an excellent presentation of this idea was given by Bickel and FrΓΌhwirth (2006).

The idea of estimating the mode as the midpoint of the shortest interval that contains a fixed number of observations goes back at least to Dalenius (1965). See also Robertson and Cryer (1974), Bickel (2002) and Bickel and FrΓΌhwirth (2006) on other estimators of the mode.

The order statistics of a sample of values of are defined by .

The half-sample mode is here defined using two rules.

Rule 1. If , the half-sample mode is . If , the half-sample mode is . If , the half-sample mode is if and are closer than and , if the opposite is true, and otherwise.

Rule 2. If , we apply recursive selection until left with or fewer values. First let . The shortest half of the data from rank to rank is identified to minimise over . Then the shortest half of those values is identified using , and so on. To finish, use Rule 1.

The idea of identifying the shortest half is applied in the "shorth" named by J.W. Tukey and introduced in the Princeton robustness study of estimators of location by Andrews, Bickel, Hampel, Huber, Rogers and Tukey (1972, p.26) as the mean of the shortest half-length for . Rousseeuw (1984), building on a suggestion by Hampel (1975), pointed out that the midpoint of the shortest half is the least median of squares (LMS) estimator of location for . See Rousseeuw (1984) and Rousseeuw and Leroy (1987) for applications of LMS and related ideas to regression and other problems. Note that this LMS midpoint is also called the shorth in some more recent literature (e.g. Maronna, Martin and Yohai 2006, p.48). Further, the shortest half itself is also sometimes called the shorth, as the title of GrΓΌbel (1988) indicates. For a Stata implementation and more detail, see shorth from SSC.

Some broad-brush comments follow on advantages and disadvantages of half-sample modes, from the standpoint of practical data analysts as much as mathematical or theoretical statisticians. Whatever the project, it will always be wise to compare results with standard summary measures (e.g. medians or means, including geometric and harmonic means) and to relate results to graphs of distributions. Moreover, if your interest is in the existence or extent of bimodality or multimodality, it will be best to look directly at suitably smoothed estimates of the density function.

Mode estimation By summarizing where the data are densest, the half-sample mode adds an automated estimator of the mode to the toolbox. More traditional estimates of the mode based on identifying peaks on histograms or even kernel density plots are sensitive to decisions about bin origin or width or kernel type and kernel half-width and more difficult to automate in any case. When applied to distributions that are unimodal and approximately symmetric, the half-sample mode will be close to the mean and median, but more resistant than the mean to outliers in either tail. When applied to distributions that are unimodal and asymmetric, the half-sample mode will typically be much nearer the mode identified by other methods than either the mean or the median.

Simplicity The idea of the half-sample mode is fairly simple and easy to explain to students and researchers who do not regard themselves as statistical specialists.

Graphic interpretation The half-sample mode can easily be related to standard displays of distributions such as kernel density plots, cumulative distribution and quantile plots, histograms and stem-and-leaf plots.

At the same time, note that

Not useful for all distributions When applied to distributions that are approximately J-shaped, the half-sample mode will approximate the minimum of the data. When applied to distributions that are approximately U-shaped, the half-sample mode will be within whichever half of the distribution happens to have higher average density. Neither behaviour seems especially interesting or useful, but equally there is little call for single mode-like summaries for J-shaped or U-shaped distributions. For U shapes, bimodality makes the idea of a single mode moot, if not invalid.

Ties The shortest half may not be uniquely defined. Even with measured data, rounding of reported values may frequently give rise to ties. What to do with two or more shortest halves has been little discussed in the literature. Note that tied halves may either overlap or be disjoint.

The procedure adopted in the Stata implementation hsmode given ties is to use the middlemost in order, except that that is in turn not uniquely defined unless is odd. The middlemost is arbitrarily taken to have position in order, counting upwards. This is thus the 1st of 2, the 2nd of 3 or 4, and so forth.

This tie-break rule has some quirky consequences. Thus with values , the rules yield as the half-sample mode, not as would be natural on all other grounds. Otherwise put, this problem can arise because for a window to be placed symmetrically the window length must be odd for odd and even for even , which is difficult to achieve given other desiderata, notably that window length should never decrease with sample size. We prefer to believe that this is a minor problem with datasets of reasonable size.

Rationale for window length Why half is taken to mean also does not appear to be discussed. Evidently we need a rule that yields a window length for both odd and even ; it is preferable that the rule be simple; and there is usually some slight arbitrariness in choosing a rule of this kind. It is also important that any rule behave reasonably for small : even if a program is not deliberately invoked for very small sample sizes the procedure used should make sense for all possible sizes. Note that, given the half-sample mode is just the single sample value, and, given , it is the average of the two sample values. A further detail about this rule is that it always defines a slight majority, thus enforcing democratic decisions about the data. However, there seems no strong reason not to use as an even simpler rule, except that if it makes much difference, then it is likely that your sample size or variable is unsuitable for the purpose.

Robertson and Cryer (1974, p.1014) reported 35 measurements of uric acid (in mg/100 ml): The Stata implementation hsmode reports a mode of 5.38. Robertson and Cryer's own estimates using a rather different procedure are . Compare with your favourite density estimation procedure.

Andrews, D.F., P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers and J.W. Tukey. 1972. Robust estimates of location: survey and advances. Princeton, NJ: Princeton University Press.

Bickel, D.R. 2002. Robust estimators of the mode and skewness of continuous data. Computational Statistics & Data Analysis 39: 153-163.

Bickel, D.R. and R. FrΓΌhwirth. 2006. On a fast, robust estimator of the mode: comparisons to other estimators with applications. Computational Statistics & Data Analysis 50: 3500-3530.

Dalenius, T. 1965. The mode - A neglected statistical parameter. Journal, Royal Statistical Society A 128: 110-117.

GrΓΌbel, R. 1988. The length of the shorth. Annals of Statistics 16: 619-628.

Hampel, F.R. 1975. Beyond location parameters: robust concepts and methods. Bulletin, International Statistical Institute 46: 375-382.

Maronna, R.A., R.D. Martin and V.J. Yohai. 2006. Robust statistics: theory and methods. Chichester: John Wiley.

Robertson, T. and J.D. Cryer. 1974. An iterative procedure for estimating the mode. Journal, American Statistical Association 69: 1012-1016.

Rousseeuw, P.J. 1984. Least median of squares regression. Journal, American Statistical Association 79: 871-880.

Rousseeuw, P.J. and A.M. Leroy. 1987. Robust regression and outlier detection. New York: John Wiley.

This account is based on documentation for

Cox, N.J. 2007. HSMODE: Stata module to calculate half-sample modes, http://EconPapers.repec.org/RePEc:boc:bocode:s456818.

See also David R. Bickel's website here for information on implementations in other software.

🌐
Investopedia
investopedia.com β€Ί terms β€Ί m β€Ί mode.asp
Mode: What It Is in Statistics and How to Calculate It
July 14, 2025 - In statistics, the mode is the most commonly observed value in a set of data. For the normal distribution, the mode is also the same value as the mean and median.
🌐
Alloprof
alloprof.qc.ca β€Ί en β€Ί students β€Ί vl β€Ί mathematics β€Ί the-mode-m1411
The Mode | Secondaire | Alloprof
The mode, or modal class, is a measure of central tendency that allows for a quick analysis of the most popular data value, or the most popular group of data, in a distribution.
Find elsewhere
🌐
Andrea Minini
andreaminini.net β€Ί math β€Ί mode-in-statistics
Mode Explained Simply (Statistics) - Andrea Minini
In a frequency distribution, the mode is the class with the highest frequency, referred to as the modal class. For example, in this table, the class 23-25 is the modal class because it has the highest frequency (14) among all classes.
🌐
Wolfram MathWorld
mathworld.wolfram.com β€Ί Mode.html
Mode -- from Wolfram MathWorld
August 14, 2002 - The mode of a set of observations is the most commonly occurring value. For example, for a data set (3, 7, 3, 9, 9, 3, 5, 1, 8, 5) (left histogram), the unique mode is 3. Similarly, for a data set (2, 4, 9, 6, 4, 6, 6, 2, 8, 2) (right histogram), ...
🌐
The Book of Statistical Proofs
statproofbook.github.io β€Ί P β€Ί norm-mode.html
Mode of the normal distribution | The Book of Statistical Proofs
January 9, 2020 - Theorem: Let $X$ be a random variable following a normal distribution: \[\label{eq:norm} X \sim \mathcal{N}(\mu, \sigma^2) \; .\] Then, the mode of $X$ is \[\label{eq:norm-mode} \mathrm{mode}(X) = \mu \; .\] Proof: The mode is the value which maximizes the probability density function: ...
🌐
Quora
quora.com β€Ί What-does-the-mode-tell-you-about-the-distribution-of-a-set-of-data
What does the mode tell you about the distribution of a set of data? - Quora
Answer: The mode is the value that appears most frequently in a data set. A set of data may have one mode, more than one mode, or no mode at all. Other popular measures of central tendency include the mean, or the average of a set, and the median, ...
🌐
Lumen Learning
courses.lumenlearning.com β€Ί introstats1 β€Ί chapter β€Ί skewness-and-the-mean-median-and-mode
Skewness and the Mean, Median, and Mode | Introduction to Statistics
Of the three statistics, the mean is the largest, while the mode is the smallest. Again, the mean reflects the skewing the most. To summarize, generally if the distribution of data is skewed to the left, the mean is less than the median, which is often less than the mode.
🌐
TechTarget
techtarget.com β€Ί searchdatacenter β€Ί definition β€Ί statistical-mean-median-mode-and-range
What are Mean, Median, Mode and Range?
The mode of a distribution with a discrete random variable is the value of the term that occurs the most often. It is not uncommon for a distribution with a discrete random variable to have more than one mode, especially if there are not many terms.
🌐
Laerd Statistics
statistics.laerd.com β€Ί statistical-guides β€Ί measures-central-tendency-mean-mode-median.php
Mean, Mode and Median - Measures of Central Tendency - When to use with Different Types of Variable and Skewed Distributions | Laerd Statistics
As we will find out later, taking the median would be a better measure of central tendency in this situation. Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.e., the frequency distribution for our data is skewed).
🌐
Scribbr
scribbr.com β€Ί home β€Ί central tendency | understanding the mean, median & mode
Central Tendency | Understanding the Mean, Median & Mode
June 21, 2023 - Mode: the most frequent value. Median: the middle number in an ordered dataset. Mean: the sum of all values divided by the total number of values. In addition to central tendency, the variability and distribution of your dataset is important ...
🌐
OpenStax
openstax.org β€Ί books β€Ί introductory-business-statistics-2e β€Ί pages β€Ί 2-6-skewness-and-the-mean-median-and-mode
2.6 Skewness and the Mean, Median, and Mode - Introductory Business Statistics 2e | OpenStax
December 13, 2023 - Of the three statistics, the mean is the largest, while the mode is the smallest. Again, the mean reflects the skewing the most. The mean is affected by outliers that do not influence the median. Therefore, when the distribution of data is skewed to the left, the mean is often less than the median.
🌐
Statistics By Jim
statisticsbyjim.com β€Ί home β€Ί blog β€Ί mean, median, and mode: measures of central tendency
Mean, Median, and Mode: Measures of Central Tendency - Statistics By Jim
June 4, 2025 - Learn more about bimodal distributions. In the dataset below, the value 5 occurs most frequently, which makes it the mode. These data might represent a 5-point Likert scale. Typically, you use the mode with categorical, ordinal, and discrete data. In fact, the mode is the only measure of central tendency that you can use with categorical dataβ€”such as the most preferred flavor of ice cream.
🌐
Brilliant
brilliant.org β€Ί wiki β€Ί data-mode
Mode | Brilliant Math & Science Wiki
Mode is that observation in a frequency distribution which occurs the maximum times in the frequency distribution, or technically speaking, which has the highest frequency. Mode is derived from the French word La Mode which means fashion.