Videos
Perhaps more of a meta-statistics question than a statistics question, but I've been trying to understand the origins of the conventional symbols used in statistics and can't find any good sources. The two most common ways to distinguish a parameter from an estimator seem to be either using roughly equivalent Greek and Latin characters or hat. I've seen both 'π' and 'p' used to represent population proportions (though 'p' is definitely more common in introductory courses) and I've seen 'π' used often as a function in Bayesian statistics. Hat seems to be the preferred method of denoting an estimator for any new methods/unestablished/'non-canonical' statistics. Both 's' and 'σ' make a lot of sense, and 'μ' makes sense for population means, so where on earth did 'x̄' come from? Was 'm' already being used elsewhere? Did it come about before these conventions were established? I'm aware the 'X' is the goto for random variables and bar is generally used to denote means, but why? Why are there competing conventions, anyways?
First of all to answer a question you didn't ask, $\mu$ is the Greek equivalent of the latin $m$, which stands for mean.
Now for the question you did ask. If you have a random variable $X$, and let's assume $X$ is positive for simplicity, then you always have a mean $\mathbb EX$ (which could be infinite). The mean is computed mathematically, by integrating against the probability density function. Thus, both the variable $X$ and the mean $\mu=\mathbb EX$ are theoretical quantities. They describe the statitician's model of the quantity of interest.
On the other hand, the way experiments commonly work is that we collect a sequence of samples to try to nail down a more accurate model. Now the experiment as a whole can be thought of as a single random object, described mathematically by a probability distribution (or better yet, measure) on an infinite sequence space. The actual measurements taken can be written as an infinite sequence $(X_i)_{i\in\mathbb N}$. Now our model will usually posit that the measurements we take all have the same distribution ($X_i$ and $X_j$ have the same law, for all $i$ and $j$) and that the measurements are independent. In this case, the central limit theorem guarantees that if you compute the sample mean $$ \lim_{n\to\infty}\frac{X_1+\cdots+X_n}{n}, $$ this a priori random quantity will in fact converge (with probability $1$) to the theoretical mean $\mathbb EX_1$.
Thus, in the limit of a very large number of samples, there ceases to be a distinction between the theoretical mean of a single variable, and the sample mean of the whole population.
Consider we have the data {x1, x2, x3, x4} with probabilities {p1, p2, p3, p4}
Expected value: $$E(x) = x1*p(x1) + x2*p(x2) + x3*p(x3) + x4*p(x4)$$
if probabilities are the same then: $E(x) = \frac{\sum xi}{4}$ that is the same as Mean (average of xis
if probabilities are not the same, then: the average of xis would be their weighted sum and that is again like E(x)