Basic Probability Theory
- Statistics 1.2

23 Jun 2025 in Mathematics on Statistics · 10 min

Axiomatic Foundations of Probability
- Sigma-Algebra
- Probability Function
Theorems

Axiomatic Foundations of Probability

When an experiment is performed, the realization of the experiment is an outcome in the sample space. If the experiment is repeated, the outcomes may vary. The probability of an event is a measure of how likely that event is to occur.

Sigma-Algebra

A collection of subsets, of a sample space $S$ is called a sigma-algebra(or Borel field), denoted by $\mathscr{B}$, if it satisfies the following properties:

$\emptyset \in \mathscr{B}$
$A \in \mathscr{B} \implies A^\complement \in \mathscr{B}$
$A_i \in \mathscr{B} \; (i \in I) \implies \bigcup_{i \in I} A_i \in \mathscr{B}$

By the second and third properties and the De Morgan’s laws, we can also conclude that

$A_i \in \mathscr{B} \; (i \in I) \implies \bigcap_{i \in I} A_i \in \mathscr{B}$.

Associated with sample space $S$, we can have many different sigma-algebras. For example, the collection of the two sets ${\emptyset, S}$ is a sigma-algebra, called the trivial sigma-algebra. The largest sigma-algebra associated with $S$ is the collection of all subsets of $S$, called the power set of $S$, denoted by $\mathcal{P}(S)$.

Probability Function

A probability function is a function $P: \mathscr{B} \to \mathbb{R}$ that satisfies the following properties:

$\forall A\in \mathscr{B}: \;P(A) \ge 0$
$P(S) = 1$
If $\set{A_i}_{i \in I} \subseteq \mathscr{B}$ are pairwise disjoint, then \[ P\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} P(A_i) \]

The three properties above are called the Kolmogorov axioms of probability. Any funcion $P$ that satisfies these axioms is called a probability function.

Theorems

1

Let $S=\set{s_1, \cdots, s_n}$ be a finite sample space and $\mathscr{B}$ be any sigma-algebra on $S$. Let $p_1, \cdots, p_n$ be non-negative real numbers such that $\sum_{i=1}^n p_i = 1$. Then there exists a unique probability function $P: \mathscr{B} \to \mathbb{R}$ such that \[ P(A) = \sum_{\set{i \mid s_i \in A}} p_i \] This also remains true if $S$ is countably infinite.

2

If $P$ is a probability function on a sigma-algebra $\mathscr{B}$, then for any $A \in \mathscr{B}$, we have:

$P(\emptyset) = 0$
$P(A) \le 1$
$P(A^\complement) = 1 - P(A)$

3

If $P$ is a probability function on a sigma-algebra $\mathscr{B}$, then for any $A, B \in \mathscr{B}$, we have:

$P(B\cap A^\complement) = P(B) - P(A\cap B)$
$P(A\cup B) = P(A) + P(B) - P(A\cap B)$
$A \subseteq B \implies P(A) \le P(B)$

4

If $P$ is a probability function on a sample space $S$ with a sigma-algebra $\mathscr{B}$, then for any partition $\set{C_i}_{i \in I}$ of $S$ and for any events $\set{A_i}_{i \in I} \subseteq \mathscr{B}$, we have: 1. \[ P(A) = \sum_{i \in I} P(A\cap C_i) \]

Boole’s Inequality: \[ P\left (\bigcup_{i \in I} A_i\right) \leq \sum_{i \in I} P(A_i) \]

Proof of 1

Since $\set{C_i}_{i \in I}$ is a partition of $S$, \[ A = A \cup S = A \cup \left(\bigcup_{i \in I} C_i\right) = \bigcup_{i \in I} (A\cap C_i) \] Thus, by the third property of the probability function, \[ P(A) = P\left(\bigcup_{i \in I} (A\cap C_i)\right) = \sum_{i \in I} P(A\cap C_i) \]

Proof of 2

Let $I = \set{1, 2, \cdots, n}$ and here $I$ is countable. Let’s construct a disjoint collection as follows: \[ A_1^\ast = A_1, \quad A_i^\ast = A_i \setminus \bigcup_{j=1}^{i-1} A_j \] Then, we have \[ P\left(\bigcup_{i=1}^n A_i\right) = P\left(\bigcup_{i=1}^n A_i^\ast\right) = \sum_{i=1}^n P(A_i^\ast) \] Since $A_i^\ast \subseteq A_i$, we have $P(A_i^\ast) \leq P(A_i)$. Thus, \[ P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i) \]

5(Bonferroni’s Inequality)

\[ P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i) - (n-1) \]

Proof

We can use the Boole’s Inequality to prove this. \[ P\left( \bigcup_{i=1}^n A_i^\complement \right) \leq \sum_{i=1}^n P(A_i^\complement) \] Using the theorem 2, we have \[ 1 - P\left( \bigcup_{i=1}^n A_i \right) \leq n - \sum_{i=1}^n P(A_i) \]

6(Inclusion-Exclusion Principle)

\[ P\left( \bigcup_{i=1}^n A_i \right) = \sum_{k=1}^n (-1)^{k-1} \sum_{\substack{ I\subseteq \set{1,\cdots,n} \nl\abs{I}=k}} P\left(\bigcap_{i \in I} A_i\right) \]

It can be proved by induction on $n$.

Basic Probability Theory
- Statistics 1.2

Axiomatic Foundations of Probability

Sigma-Algebra

Probability Function

Theorems

1

2

3

4

5(Bonferroni’s Inequality)

6(Inclusion-Exclusion Principle)

Jiho's Blog

Error

Axiomatic Foundations of Probability

Sigma-Algebra

Probability Function

Theorems

1

2

3

4

5(Bonferroni’s Inequality)

6(Inclusion-Exclusion Principle)

Templates (for web app):

Error