Short answer:
Intuitively, for the displacement operator, the exponential accumulates an infinite number of infinitesimal displacements, and this gives rise to an overall macroscopic finite displacement. The same principle holds for the rotation operator, i.e., accumulation of many small rotations.
Since the momentum operator generates a displacement via exponentiation, the momentum operator is called the generator of displacement.
Long Answer:
The exponentiation of an operator can be understood from the first principle using the definition of differentiation
And then we juggle the terms around to find a linearized approximation of
,
\begin{align}
\lim_{h\to 0}\psi \left( {x + h} \right) & = \lim_{h\to 0} \left[\psi \left( x \right) + h\frac{d}{{dx}}\psi \left( x \right)\right], \\ \\
& = \lim_{h\to 0} \left[1 + h\frac{d}{{dx}}\right]\psi \left( x \right), \\ \\
& = {\mathcal{T}}_h\cdot \psi \left( x \right).
\end{align}
The last line of the above equation tells us that if we would like to march forward (or backward) by a small quantity
, we need to myltiply the function
by the operator
For small , we can find the value of the function
by applying
twice on the function
,
We can continue further and evaluate the function at
, i.e.,
, by applying
operator
times on
:
\begin{align}
\psi(x+Nh) & = \mathcal{T}_h^N \psi(x). \\
& = \lim_{h\to 0}\left[1+h \frac{d}{dx}\right]^N\psi(x). \tag{1}\label{marchF}
\end{align}
If we define the quantity as
, this reduces to the folloing substitution
and plugging this into Eq. \eqref{marchF}, we arrive
\begin{align} \psi(x+a) & = \lim_{N\to \infty}\left[1+ \frac{a}{N} \frac{d}{dx}\right]^N\psi(x) \\ \\ & = \exp\left[a \frac{d}{dx}\right] \cdot \psi(x), \\ \\ & = T_a \cdot \psi(x). \end{align} In the second line, we have used the definition of exponential function which is one of the key points towards an understanding of how all these generators come about.
We can massage translation operator a little further to look like the standard form
\begin{align}
T_a & := \exp\left[a\frac{d}{dx}\right], \\ \\
& = \exp\left[-i \frac{a}{\hbar}\cdot \left(i \hbar \frac{d}{dx}\right)\right], \\ \\
& = \exp\left[-i \frac{a}{\hbar}\cdot \hat{p}_x\right].
\end{align}
Short answer:
Intuitively, for the displacement operator, the exponential accumulates an infinite number of infinitesimal displacements, and this gives rise to an overall macroscopic finite displacement. The same principle holds for the rotation operator, i.e., accumulation of many small rotations.
Since the momentum operator generates a displacement via exponentiation, the momentum operator is called the generator of displacement.
Long Answer:
The exponentiation of an operator can be understood from the first principle using the definition of differentiation
And then we juggle the terms around to find a linearized approximation of
,
\begin{align}
\lim_{h\to 0}\psi \left( {x + h} \right) & = \lim_{h\to 0} \left[\psi \left( x \right) + h\frac{d}{{dx}}\psi \left( x \right)\right], \\ \\
& = \lim_{h\to 0} \left[1 + h\frac{d}{{dx}}\right]\psi \left( x \right), \\ \\
& = {\mathcal{T}}_h\cdot \psi \left( x \right).
\end{align}
The last line of the above equation tells us that if we would like to march forward (or backward) by a small quantity
, we need to myltiply the function
by the operator
For small , we can find the value of the function
by applying
twice on the function
,
We can continue further and evaluate the function at
, i.e.,
, by applying
operator
times on
:
\begin{align}
\psi(x+Nh) & = \mathcal{T}_h^N \psi(x). \\
& = \lim_{h\to 0}\left[1+h \frac{d}{dx}\right]^N\psi(x). \tag{1}\label{marchF}
\end{align}
If we define the quantity as
, this reduces to the folloing substitution
and plugging this into Eq. \eqref{marchF}, we arrive
\begin{align} \psi(x+a) & = \lim_{N\to \infty}\left[1+ \frac{a}{N} \frac{d}{dx}\right]^N\psi(x) \\ \\ & = \exp\left[a \frac{d}{dx}\right] \cdot \psi(x), \\ \\ & = T_a \cdot \psi(x). \end{align} In the second line, we have used the definition of exponential function which is one of the key points towards an understanding of how all these generators come about.
We can massage translation operator a little further to look like the standard form
\begin{align}
T_a & := \exp\left[a\frac{d}{dx}\right], \\ \\
& = \exp\left[-i \frac{a}{\hbar}\cdot \left(i \hbar \frac{d}{dx}\right)\right], \\ \\
& = \exp\left[-i \frac{a}{\hbar}\cdot \hat{p}_x\right].
\end{align}
: the spatial displacement operator, moves the wave function
along the x coordinate,
is the momentum operator which generates the displacement.
First let me say that using in the definition
of
is really confusing, because
is also used
as the spatial coordinate of the wave function
.
Therefore I prefer to write it as
We need to show that this operator is displacing
a wave function
by a distance
in
-direction.
Recall the definition of the momentum operator
Plugging (2) into (1) we get
Now we use the well-known expansion of the exponential function
with
to rewrite (3)
and get:
This is still an operator equation.
We let the operators on the left and right side
operate on an arbitrary wave function
and get:
Here we recognize the right side as the Taylor series
expansion of . So we finally have:
Now we have proven that the operator has displaced
the wave function
by a distance
in
-direction.
Showing that
rotates the wave function
by an angle
around the
-axis,
can be done in a similar way by using the definition of
the angular momentum operator
.
Videos
There is no 'only if' because it is not true: \begin{align} e^{A+B} = e^A e^B \end{align} does not necessarily imply $[A,B] = 0$.
One can easily find an example of this using matrices. Here's one: \begin{align} A= \begin{pmatrix} 0 & 0 \\ 0 & 2\pi i \end{pmatrix}, B=\begin{pmatrix} 0 & 1 \\ 0 & 2 \pi i \end{pmatrix}. \end{align} $[A,B] \neq 0$ but $e^{A+ B} = e^A e^B = I$.
Edit: Let me help with the if part, using a differential equation as OP desires. Compute \begin{align} \frac{d}{dt}(e^{t(A+B)}e^{-tA}e^{-tB}), \end{align} and show that it is $0$ if $[A,B] = 0$.
That implies that $e^{t(A+B)}e^{-tA}e^{-tB}$ is independent of $t$. In particular, plugging in $t = 0$ gives $e^{t(A+B)}e^{-tA}e^{-tB} = I$ for all $t$. Then plug in $t = 1$ to get $e^{(A+B)}e^{-A}e^{-B} = I$.
QED.
Define the function $f(u)=e^{uA}e^{uB}$. If $A$ and $B$ commute, you can take the derivatives of the $f(u)$ as follows:
$$\frac{df(u)}{du}=\frac{de^{uA}}{du}e^{uB}+e^{uA}\frac{de^{uB}}{du}=Ae^{uA} e^{uB}+e^{uA}Be^{uB}$$
$$\frac{d^2f(u)}{du^2}=A^2e^{uA} e^{uB}+2Ae^{uA}Be^{uB}+e^{uA}B^2e^{uB}$$
etc.
Now take the Maclaurin series expansion:
$$f(u)=1+(A+B)u+\frac{1}{2!}(A^2+2AB+B^2)u^2+\ldots=\\ =1+(A+B)u+\frac{1}{2!}(A+B)^2 u^2+\frac{1}{3!}(A+B)^3 u^3+\ldots=e^{(A+B)u}$$
For $u=1$ you get the desired result:
$$f(1)=e^{A}e^{B}=e^{A+B}$$
Also check this:
Baker–Campbell–Hausdorff formula
Matrix exponential
Starting with:
$$U(t,t_i) = e^{\frac{-i}{\hbar }H(t-t_i)}$$
If $t_i=0$:
$$U(t,0) = e^{\frac{-i}{\hbar }Ht}$$
Using the identity: $\sum\limits_i \left|\lambda_i\right>\left<\lambda_i\right|=\mathbb{I}$
$$U(t,0) = \sum\limits_i e^{\frac{-i}{\hbar }Ht}\left|\lambda_i\right>\left<\lambda_i\right|$$
Since the exponential of an operator is (by Taylor expanding): $e^H=\mathbb{I}+H+\frac{1}{2}H^2+\dots$
And: $H\left| \lambda_i \right> =\lambda_i \left| \lambda_i \right>$
You should be able to see that:
$$U(t,0) = \sum\limits_i e^{\frac{-i}{\hbar }\lambda_it}\left|\lambda_i\right>\left<\lambda_i\right|$$
Without loss of generality, let's take the $|\lambda_i\rangle$ to be orthonormal. Notice that, by the spectral theorem, the hamiltonian can be written as follows: $$ H = \sum_i \lambda_i P_i, \qquad P_i = |\lambda_i\rangle\langle \lambda_i| $$ Each operator $P_i$ is a projectors onto the subspace spanned by $|\lambda_i\rangle$. Notice, in particular, that $$ P_i^2 = P_i, \qquad P_iP_j = P_jP_i = 0 $$ and a mathematical induction argument gives $$ P_i^n = P_i $$ for all $n\geq 1$. Now, for notational simplicity let $$ \mu = -\frac{i}{\hbar}t $$ Then we have $$ U(t,0) = e^{\mu H} = \sum_{n=0}^\infty \frac{1}{n!}\mu^nH^n $$ but notice that using the properties of projection operators written above, we have $$ H^n = \sum_{i_1, \dots, i_n}\lambda_{i_1}\cdots\lambda_{i_n}P_{i_1}\cdots P_{i_n} = \sum_i\lambda_i^nP_i $$ and therefore $$ U(t,0) = \sum_i\sum_n\frac{1}{n!}(\mu\lambda_i)^nP_i = \sum_ie^{\mu\lambda_i}P_i $$ as desired.
There's no escaping Lie theory if you want to understand what is going on mathematically. I'll try to provide some intuitive pictures for what is going on in the footnotes, though I'm not sure if it will be what you are looking for.
On any (finite-dimensional, for simplicity) vector space, the group of unitary operators is the Lie group $\mathrm{U}(N)$, which is connected. Lie groups are manifolds, i.e. things that locally look like $\mathbb{R}^N$, and as such possess tangent spaces at every point spanned by the derivatives of their coordinates — or, equivalently, by all possible directions of paths at that point. These directions form, at $g \in \mathrm{U}(N)$, the $N$-dimensional vector space $T_g \mathrm{U}(N)$.1
Canonically, we take the tangent space at the identity $\mathbf{1} \in \mathrm{U}(N)$ and call it the Lie algebra $\mathfrak{g} \cong T_\mathbf{1}\mathrm{U}(N)$. Now, from tangent spaces, there is something called the exponential map to the manifold itself. It is a fact that, for compact groups, such as the unitary group, said map is surjective onto the part containing the identity.2 It is a further fact that the unitary group is connected, meaning that it has no parts not connected to the identity, so the exponential map $\mathfrak{u}(N) \to \mathrm{U}(N)$ is surjective, and hence every unitary operator is the exponential of some Lie algebra element.3 (The exponential map is always surjective locally, so we are in principle able to find exponential forms for other operators, too)
So, the above (and the notes) answers to your first three questions: We can always represent a unitary operator like that since $\mathrm{U}(N)$ is compact and connected, the exponential of an operator means "walking in the direction specified by that operator", and while $\mathcal{U}$ lies in the Lie group, $\mathcal{T}$ lies, as its generator, in the Lie algebra. One also says that $\mathcal{T}$ is the infinitesimal generator of $\mathcal{U}$, since, in $\mathrm{e}^{\alpha \mathcal{T}}$, we can see it as giving only the direction of the operation, while $\alpha$ tells us how far from the identity the generated exponetial will lie.
The physical meaning is a difficult thing to tell generally - often, it will be that the $\mathcal{T}$ is a generator of a symmetry, and the unitary operator $\mathcal{U}$ is the finite version of that symmetry, for example, the Hamiltonian $H$ generates the time translation $U$, the angular momenta $L_i$ generate the rotations $\mathrm{SO}(3)$, and so on, and so forth — the generator is always the infinitesimal version of the exponentiated operator in the sense that
$$ \mathrm{e}^{\epsilon T} = 1 + \epsilon T + \mathcal{O}(\epsilon^2)$$
so the generated operator will, for small $\epsilon$ be displaced from the identity by almost exactly $\epsilon T$.
1 Think of the circle (which is $\mathrm{U}(1)$): At every point on the circle, you can draw the tangent to it - which is $\mathbb{R}$, a 1D vector space. The length of the tangent vector specifies "how fast" the path in that direction will be traversed.
2 Think of the two-dimensional sphere (which is, sadly, not a Lie group, but illustrative for the exponential map). Take the tangent space at one point and imagine you are actually holding a sheet of paper next to a sphere. Now "crumble" the paper around the sphere. You will end up covering the whole sphere, and if the paper is large enough (it would have to be infinte to represent the tangent space), you can even wind it around the sphere multiple times, thus showing that the exponential map cannot be injective, but is easily seen to be surjective. A more precise notion of this crumbling would be to fix some measure of length on the sphere and map every vector in the algebra to a point on the sphere by walking into the direction indicated by the vector exactly as far as its length tells you.
3 This is quite easy to understand - if there were some part of the group wholly disconnected to our group, or if our group had infinite volume (if it was non-compact), we could not hope to cover it wholly with only one sheet of paper, no matter how large.
Well, quantum mechanics is famous for not being intuitive for earthlings like us, but the following couple of facts might help:
Observables in quantum mechanics are Hermitian/selfadjoint operators.
The spectrum ${\rm Spec}(\hat{A}) \subseteq \mathbb{R}$ of a Hermitian/self-adjoint operator $\hat{A}$ belongs to the real axis $\mathbb{R}\subseteq \mathbb{C}$, cf. e.g. this Phys.SE post.
The spectrum ${\rm Spec}(\hat{U}) \subseteq \{z\in \mathbb{C} \mid |z|=1\}$ of a unitary operator belongs to the unit circle.
The function $z\mapsto e^{iz}$ maps the real axis to the unit circle.
Stone's theorem establishes roughly speaking a correspondence $\hat{U} = e^{i\hat{A}}$ between unitary and self-adjoint operators.