Probabilistic Inequalities
- Statistics 4.6

03 Aug 2025 in Mathematics on Statistics · 16 min

Hölder’s inequality
Minkowski’s inequality
Jensen’s inequality
- Proof
Inequality for means
- Special values of $p$
- Proof

Hölder’s inequality

Hölder’s inequality states that for any random variables $X$ and $Y$ and for any $p, q > 1$ such that

\[ \frac{1}{p} + \frac{1}{q} = 1 \]

the following holds:

\[ \abs{\mathrm{E}[XY]} \leq \mathrm{E}\abs{XY} \leq \big(\mathrm{E}\abs{X}^p\big)^\frac{1}{p} \big(\mathrm{E}\abs{Y}^q\big)^\frac{1}{q} \]

Lemma

Let $a$ and $b$ be non-negative real numbers. Then we have:

\[ \frac{1}{p} a^p + \frac{1}{q} b^q \geq a b \]

with equality if and only if $a^p = b^q$.

Proof

Fix $b$, and consider the function

\[ f(a) = \frac{1}{p} a^p + \frac{1}{q} b^q - ab \]

The derivative is given by: \[ f'(a) = a^{p-1} - b \]

To minimize $f(a)$, we set $f'(a) = 0$ and get $a = b^{\frac{1}{p-1}} = b^{\frac{q}{p}}$. Substituting this back into $f(a)$ gives:

\[ f\left(b^{\frac{q}{p}}\right) = \frac{1}{p} b^q + \frac{1}{q} b^q - b^{\frac{q}{p}+1} = b^q - b^q = 0 \]

A check of the second derivative confirms that this is indeed a minimum, and thus we prove the lemma.

Proof

The first inequality simply follows from $-\abs{XY} \le XY \le \abs{XY}$. Now let’s prove the second one. By setting

\[ a = \frac{\abs{X}}{(\mathrm{E}\abs{X}^p)^{\frac{1}{p}}}, \quad b = \frac{\abs{Y}}{(\mathrm{E}\abs{Y}^q)^{\frac{1}{q}}} \]

and applying the lemma, we have:

\[ \frac{1}{p} \frac{\abs{X}^p}{\mathrm{E}\abs{X}^p} + \frac{1}{q} \frac{\abs{Y}^q}{\mathrm{E}\abs{Y}^q} \geq \frac{\abs{XY}}{(\mathrm{E}\abs{X}^p)^{\frac{1}{p}} (\mathrm{E}\abs{Y}^q)^{\frac{1}{q}}} \]

Taking expectation on both sides yields:

\[ 1 \geq \frac{\mathrm{E}\abs{XY}}{(\mathrm{E}\abs{X}^p)^{\frac{1}{p}} (\mathrm{E}\abs{Y}^q)^{\frac{1}{q}} } \]

and thus we obtain Hölder’s inequality.

Cauchy–Schwarz inequality

Cauchy–Schwarz inequality is a special case of Hölder’s inequality where $p = q = 2$. It states that for any random variables $X$ and $Y$:

\[ \abs{\mathrm{E}[XY]} \leq \mathrm{E}\abs{XY} \leq \sqrt{\mathrm{E}\abs{X}^2} \sqrt{\mathrm{E}\abs{Y}^2} \]

or also written as:

\[ (\mathrm{E}[XY])^2 \leq \mathrm{E}\left[X^2\right] \mathrm{E}\left[Y^2\right] \]

Covariance and correlation

If $X$ and $Y$ have means $\mu_X, \mu_Y$ and variances $\sigma_X^2, \sigma_Y^2$, we can apply Cauchy–Schwarz inequality to the centered variables $X - \mu_X$ and $Y - \mu_Y$:

\[ \abs{\mathrm{E}[(X - \mu_X)(Y - \mu_Y)]} \leq \sqrt{\mathrm{E}[(X - \mu_X)^2]} \sqrt{\mathrm{E}[(Y - \mu_Y)^2]} \]

This leads to the well-known properties of covariance and correlation:

\[ \abs{\text{Cov}(X, Y)} \leq \sigma_X \sigma_Y \]

and

\[ \abs{\text{Corr}(X, Y)} \leq 1 \]

Minkowski’s inequality

Minkowski’s inequality is a generalization of the triangle inequality for $L^p$ norms. It states that for any random variables $X$ and $Y$ and for any $p \geq 1$:

\[ (\mathrm{E}\abs{X+Y}^p)^{\frac{1}{p}} \leq (\mathrm{E}\abs{X}^p)^{\frac{1}{p}} + (\mathrm{E}\abs{Y}^p)^{\frac{1}{p}} \]

Proof is omitted. It can be done by applying Hölder’s inequality and the basic triangle inequality $\abs{X+Y} \leq \abs{X} + \abs{Y}$.

Jensen’s inequality

Jensen’s inequality states that for any convex function $g$ and any random variable $X$:

\[ E[g(X)] \geq g(E[X]) \]

Equality holds if and only if $g$ is linear on the support of $X$. If $g$ is concave, the inequality is reversed:

\[ E[g(X)] \leq g(E[X]) \]

Proof

Let $l(x)$ be the linear function that is tangent to $g$ at the point $x= E[X]$. Then, by the definition of convexity, we have $g(x) \geq l(x)$ for all $x$. Taking expectation on both sides gives:

\[ \mathrm{E}[g(X)] \geq \mathrm{E}[l(X)] = l(\mathrm{E}[X]) = g(\mathrm{E}[X]) \]

Inequality for means

Jensen’s inequality can be used to prove an inequality between the three means: arithmetic mean, geometric mean, and harmonic mean. We define the arithmetic mean of a positive random variable $X$ as:

\[ \mathrm{AM}[X] = \mathrm{E}[X] \]

the geometric mean as:

\[ \mathrm{GM}[X] = \exp\left(\mathrm{E}[\ln X]\right) \]

and the harmonic mean as: \[ \mathrm{HM}[X] = \frac{1}{\mathrm{E}[1/X]} \]

Then $\mathrm{AM}[X] \geq \mathrm{GM}[X] \geq \mathrm{HM}[X]$ for any positive random variable $X$. More generally, for any $p\in \mathbb{R}$, we define the power mean as:

\[ M_p[X] = \left(\mathrm{E}[X^p]\right)^{\frac{1}{p}} = \exp\left(\frac{1}{p} \ln \mathrm{E}[X^p]\right) \]

We can check that $M_1[X] = \mathrm{AM}[X]$, $M_0[X] = \mathrm{GM}[X]$, and $M_{-1}[X] = \mathrm{HM}[X]$. Also, for $p\to \infty$, we have $M_p[X] \to \max X$ and for $p\to -\infty$, we have $M_p[X] \to \min X$. Checking this later, we have the powerful theorem that $M_p[X]$ is an increasing function of $p$ for any positive random variable $X$.

\[ \pdv{}{p} M_p[X] \geq 0 \]

Special values of $p$

Let’s check that $M_0[X] = \mathrm{GM}[X]$ is indeed the geometric mean. By the L’Hôpital’s rule, we have:

\[ \lim_{p \to 0} \ln M_p[X] = \lim_{p \to 0} \frac{\ln \mathrm{E}[X^p]}{p} = \lim_{p \to 0} \frac{\mathrm{E}[ X^p \ln X]}{\mathrm{E}[X^p]} = \mathrm{E}[\ln X] \]

Thus, we have:

\[ \lim_{p \to 0} M_p[X] = \exp\left(\mathrm{E}[\ln X]\right) = \mathrm{GM}[X] \]

Let’s also check that $M_\infty[X] = \max X$ is indeed the maximum. We can write the power mean as:

\[ M_p[X] = \max X \exp\left(\frac{1}{p} \ln \mathrm{E}\left[\left(\frac{X}{\max X}\right)^p\right]\right) \]

Also by the L’Hôpital’s rule, we have:

\[ \begin{align*} \lim_{p \to \infty} \ln \frac{M_p[X]}{\max X} &= \lim_{p \to \infty} \frac{\ln \mathrm{E}\left[\left(\frac{X}{\max X}\right)^p\right]}{p} \nl &= \lim_{p \to \infty} \frac{\mathrm{E}\left[\left(\frac{X}{\max X}\right)^p \ln \frac{X}{\max X}\right]}{\mathrm{E}\left[\left(\frac{X}{\max X}\right)^p\right]} \nl &= \frac{f_X(\max X) \ln 1}{f_X(\max X)} \nl &= 0 \end{align*} \]

Thus, we have:

\[ \lim_{p \to \infty} M_p[X] = \max X \]

Similarly, we can check that $M_{-\infty}[X] = \min X$ is indeed the minimum.

Proof

First, we have the following property of the power means:

\[ M_p[X] \geq M_q[X] \implies M_{-p}[X] \leq M_{-q}[X] \]

It is proved by substituting $X$ with $1/X$ in the definition of the power mean. Now, we can prove the inequality including the geometric mean:

\[ M_{-p}[X] \leq M_0[X] \leq M_p[X] \]

for any $p> 0$. The proof follows from the Jensen’s inequality applied to the concave function $g(x) = \ln x$:

\[ \mathrm{E}[\ln X] \leq \ln \mathrm{E}[X] \]

Substituting $X$ with $X^p$ gives:

\[ \exp \mathrm{E} [\ln X] \leq \mathrm{E}[X^p]^{\frac{1}{p}} \]

and substituting $X$ with $1/X$ gives:

\[ \exp \mathrm{E} [\ln X] \geq \mathrm{E}[X^{-p}]^{-\frac{1}{p}} \]

and the inequality is proved. Showing that $M_p[X]$ is an increasing function at $p>0$, the theorem is automatically proved by the properties proven above. For $0<p\leq q$, define $r=q/p$ and apply the Jensen’s inequality to the convex function $g(x) = x^r$:

\[ \mathrm{E}[X]^r \leq \mathrm{E}[X^r] \]

Substituting $Y = X^{1/p}$ gives:

\[ \mathrm{E}[Y^p]^\frac{1}{p} \leq \mathrm{E}[Y^q]^\frac{1}{q} \]

Thus, we have:

\[ p<q \implies M_p[X] \leq M_q[X] \]

By constructing the probability theory upon the measure theory, we can treat these probabilistic properties more rigorously.

Probabilistic Inequalities
- Statistics 4.6

Hölder’s inequality

Lemma

Proof

Cauchy–Schwarz inequality

Covariance and correlation

Minkowski’s inequality

Jensen’s inequality

Proof

Inequality for means

Special values of $p$

Proof

Jiho's Blog

Error

Hölder’s inequality

Lemma

Proof

Cauchy–Schwarz inequality

Covariance and correlation

Minkowski’s inequality

Jensen’s inequality

Proof

Inequality for means

Special values of $p$

Proof

Templates (for web app):

Error