The axiom of choice was formulated in 1904 by Ernst Zermelo in order to formalize his proof of the well-ordering theorem.
Well-ordering theorem: Every set is well-orderable.
Let $S$ be a set.
Let $\mathcal P(S)$ be the power set of $S$.
By the Axiom of Choice, there is a choice function $c$ defined on $\mathcal P(S \setminus\set\emptyset)$.
We will use $c$ and the Principle of Transfinite Induction to define a bijection between $S$ and some ordinal.
Intuitively, we start by pairing $c(S)$ with $0$, and then keep extending the bijection by pairing $c(S\setminus X)$ with $\alpha$, where $X$ is the set of elements already dealt with.
Basis for the Induction
$\alpha = 0$
Let $s_0 =c(S)$.
Inductive Step
Suppose $s_\beta$ has been defined for all $\beta < \alpha$.
If $S \setminus \{s_\beta: \beta < \alpha\}$ is empty, we stop.
Otherwise, define:$$s_\alpha :=c (S \setminus\{s_\beta: \beta < \alpha\} )$$The process eventually stops, else we have defined bijections between subsets of $S$ and arbitrarily large ordinals.
Now, we can impose a well-ordering on $S$ by embedding it via $s_\alpha \to \alpha$ into the ordinal $\beta = \displaystyle {\bigcup_{s_\alpha \mathop \in S} \alpha}$ and using the well-ordering of $\beta$.
The Well-Ordering Theorem holds iff the Axiom of Choice holds.
Assume the Well-Ordering Theorem holds.
Let $\mathcal F$ be an arbitrary collection of sets.
By assumption all sets can be well-ordered.
Hence the set $\bigcup \mathcal F$ of all elements of sets contained in $\mathcal F$ is well-ordered by some ordering $<$.
By definition, in a well-ordered set, every subset has a unique least element.
Also, note that each set in $\mathcal F$ is a subset of $\bigcup \mathcal F$.
Thus, we may define the choice function $c$:$$\forall X \in \mathcal F: c: \mathcal F \to \bigcup \mathcal F$$by letting $c(X)$ be the least element of $X$ under $<$.