What Happens When Magnet Attracts Metal?

(To be rephrased)

When you witness a magnet pulling a piece of iron, you are seeing one of the most profound "glitches" in classical physics. To understand why that piece of metal moves, we must dismantle our classical intuition and rebuild it using the tools of Dirac, Pauli, and Heisenberg.

1. The Paradox: Why the Lorentz Force is "Lazy"

In classical electromagnetism, the force exerted on a charge $q$ moving with velocity $\vec{v}$ in a magnetic field $\vec{B}$ is given by the Lorentz force law:

{\vec{F}}_{m a g} = q (\vec{v} \times \vec{B})

By definition, this force is always perpendicular to the velocity vector ( $\vec{F} \cdot \vec{v} = 0$ ). Since power is defined as the rate of doing work ( $P = \vec{F} \cdot \vec{v}$ ), the magnetic force performs zero work.

Yet, when a magnet attracts a piece of metal, the metal clearly gains kinetic energy. If the magnetic force cannot do work, what is actually pulling the metal? The answer lies in the fact that magnetism is not merely a classical force, but a quantum relativistic correction.

2. The Dead End: The Bohr-van Leeuwen Theorem

Before we find the solution, we must prove that classical physics is incapable of producing magnetism. This is the Bohr-van Leeuwen Theorem.

Consider a system of $N$ electrons. The magnetic moment $\vec{μ}$ is defined by the orbital motion of charges:

\vec{μ} = \frac{q}{2} \vec{r} \times \vec{v} = \frac{q}{2 m} (\vec{r} \times m \vec{v}) = \frac{q}{2 m} \vec{L}

Crucially, the total magnetization in the $z$ -direction is a linear function of the generalized velocities ${\dot{q}}_{i}$ :

M_{z} = \sum_{i = 1}^{3 N} a_{i} (q_{1}, \dots, q_{3 N}) {\dot{q}}_{i}

In an electromagnetic field, the Hamiltonian $H$ (total energy) is given by:

H = \sum_{i = 1}^{N} \frac{({\vec{p}}_{i} - q \vec{A})^{2}}{2 m} + q V (q_{1}, \dots, q_{N})

Using Hamilton’s equations of motion, we have ${\dot{q}}_{i} = \frac{\partial H}{\partial p_{i}}$ . The thermal average of the magnetization $⟨ M_{z} ⟩$ is calculated via the canonical partition function:

⟨ M_{z} ⟩ = \frac{\int d q_{1} \dots d q_{3 N} \int d p_{1} \dots d p_{3 N} M_{z} e^{- H / k_{B} T}}{\int d q_{1} \dots d q_{3 N} \int d p_{1} \dots d p_{3 N} e^{- H / k_{B} T}}

Let us focus on the momentum integral for a single term $a_{i} {\dot{q}}_{i}$ . Since $a_{i}$ depends only on coordinates, we can pull it out and integrate over the momentum $p_{i}$ :

\int_{- \infty}^{+ \infty} d p_{i} {\dot{q}}_{i} e^{- H / k_{B} T} = \int_{- \infty}^{+ \infty} d p_{i} \frac{\partial H}{\partial p_{i}} e^{- H / k_{B} T}

This integral is mathematically equivalent to integrating a total derivative. By recognizing that $\frac{\partial H}{\partial p_{i}} e^{- β H} = - \frac{1}{β} \frac{\partial}{\partial p_{i}} (e^{- β H})$ (where $β = 1 / k_{B} T$ ), we have:

\int_{H (p = - \infty)}^{H (p = + \infty)} d H e^{- H / k_{B} T} = {[- k_{B} T e^{- H / k_{B} T}]}_{p_{i} = - \infty}^{p_{i} = + \infty}

Since the kinetic energy term $\frac{(p - q A)^{2}}{2 m}$ goes to infinity as momentum $p \to \pm \infty$ , the Boltzmann factor $e^{- H / k_{B} T}$ vanishes at the boundaries. Thus:

{[- k_{B} T e^{- H / k_{B} T}]}_{- \infty}^{+ \infty} = 0 - 0 = 0

Result: $⟨ M ⟩ = 0$ . The physics is merciless: The vector potential $\vec{A}$ shifts the momentum, but since we integrate over all possible momenta from $- \infty$ to $+ \infty$ , this shift contributes nothing to the total integral. Classical physics predicts that magnets cannot exist.

3. Beyond the Schrödinger Equation

The Schrödinger equation is built upon the non-relativistic energy-momentum relation $E = {\vec{p}}^{2} / 2 m$ . If we attempt to use the relativistic relation $E^{2} = c^{2} {\vec{p}}^{2} + m^{2} c^{4}$ by substituting quantum operators $E \to i ℏ \partial_{t}$ and $\vec{p} \to - i ℏ \nabla$ , we obtain the Klein-Gordon Equation:

\frac{1}{c^{2}} \frac{\partial^{2} ψ}{\partial t^{2}} - \nabla^{2} ψ + {(\frac{m c}{ℏ})}^{2} ψ = 0

This equation can be written more compactly using the d'Alembertian operator, $◻ = \frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} - \nabla^{2}$ , which is the Minkowski spacetime generalization of the Laplacian operator:

(◻ + {(\frac{m c}{ℏ})}^{2}) ψ = 0

However, when first proposed, the Klein-Gordon equation was nearly abandoned due to two significant conceptual problems that arose from interpreting $ψ$ as a single-particle wave function, analogous to its role in the Schrödinger equation.

On one hand, the relativistic energy-momentum relation is quadratic in energy, leading to two solutions: $E = \pm \sqrt{(p c)^{2} + (m c^{2})^{2}}$ . The existence of negative energy states was deeply problematic. It implied that a particle could continuously radiate energy by transitioning to ever-lower negative energy levels, suggesting that no stable ground state existed for matter.

On the other hand, the wave function in quantum mechanics is used to construct a conserved probability current, $j^{μ} = (ρ, j)$ , where $ρ$ is the probability density. For the Schrödinger equation, the probability density $ρ = | ψ |^{2}$ is always non-negative. For the Klein-Gordon equation, the derived conserved density is:

ρ = \frac{i ℏ}{2 m c^{2}} (ψ^{*} \frac{\partial ψ}{\partial t} - ψ \frac{\partial ψ^{*}}{\partial t})

Because the equation is second-order in time, the values of $ψ$ and $\frac{\partial ψ}{\partial t}$ can be chosen independently at any given initial time. This allows for conditions where $ρ$ can be negative, which is nonsensical for a probability density. A particle cannot have a negative probability of being found somewhere.

These problems stem from the second-order nature of the time derivative. Dirac's goal was to find a new equation that was simultaneously first-order in time (like the Schrödinger equation) and relativistically covariant. The natural approach is to write a Hamiltonian linear in momentum with to-be-defined coefficients:

\hat{H} = c (α_{x} {\hat{p}}_{x} + α_{y} {\hat{p}}_{y} + α_{z} {\hat{p}}_{z}) + β m c^{2}

where ${\hat{p}}_{k} = - i ℏ \frac{\partial}{\partial x_{k}}$ are the momentum operators. The corresponding wave equation is:

i ℏ \frac{\partial ψ}{\partial t} = \hat{H} ψ = (- i ℏ c \sum_{k = 1}^{3} α_{k} \frac{\partial}{\partial x^{k}} + β m c^{2}) ψ = (- i ℏ c \vec{α} \cdot \nabla + β m c^{2}) ψ

For this equation to be relativistically consistent, any free-particle solution to it must also satisfy the relativistic energy-momentum relation (i.e., the Klein-Gordon equation). This requires that when the Dirac operator is applied twice ("squaring" the equation), it yields the Klein-Gordon operator:

(c \sum α_{i} p_{i} + β m c^{2})^{2} = c^{2} \sum p_{i}^{2} + m^{2} c^{4}

Expanding the left side (noting that $p_{i}$ commute with each other, but the coefficient matrices $α_{i}, β$ do not necessarily commute):

c^{2} \sum_{i} α_{i}^{2} p_{i}^{2} + c^{2} \sum_{i < j} {α_{i}, α_{j}} p_{i} p_{j} + m c^{3} \sum_{i} {α_{i}, β} p_{i} + β^{2} m^{2} c^{4}

Comparing the coefficients on both sides, we derive the algebraic requirements for $α_{i}$ and $β$ , known as the Clifford Algebra:

$α_{i}^{2} = I$
$β^{2} = I$
${α_{i}, α_{j}} = 0 (i \neq j)$
${α_{i}, β} = 0$

To make the equation manifestly covariant under Lorentz transformations, we define the Gamma matrices $γ^{μ}$ :

$γ^{0} = β$
$γ^{i} = β α_{i}$ ( $i = 1, 2, 3$ )

Substituting these back, we obtain the concise covariant form:

(i γ^{μ} \partial_{μ} - \frac{m c}{ℏ}) ψ = 0, where \partial_{μ} = (\frac{1}{c} \partial_{t}, \nabla)

Using the properties of $α_{i}$ and $β$ , we can derive the core properties of $γ^{μ}$ :

Time component: $(γ^{0})^{2} = β^{2} = I$
Space components: $(γ^{i})^{2} = (β α_{i}) (β α_{i}) = - β^{2} α_{i}^{2} = - I$
Mixed terms ( $μ \neq ν$ ):
${γ^{0}, γ^{i}} = β (β α_{i}) + (β α_{i}) β = α_{i} - α_{i} = 0$
${γ^{i}, γ^{j}} = (β α_{i}) (β α_{j}) + (β α_{j}) (β α_{i}) = - β^{2} {α_{i}, α_{j}} = 0$

Combining these results, the Gamma matrices satisfy the famous Anti-commutation Relation:

{γ^{μ}, γ^{ν}} = 2 η^{μ ν} I

where $η^{μ ν} = diag (1, - 1, - 1, - 1)$ is the Minkowski metric.

The smallest irreducible representation satisfying this algebra involves $4 \times 4$ matrices. We can deduce this by elimination:

1D (Scalar): Impossible. Scalars commute, so they cannot satisfy anti-commutation relations like $γ^{0} γ^{1} = - γ^{1} γ^{0}$ unless they are zero, but $γ^{2} = \pm I$ forbids that.
2D: We can find at most 3 anti-commuting matrices (the Pauli matrices $σ_{x}, σ_{y}, σ_{z}$ ). However, we need 4 anti-commuting matrices ( $γ^{0}$ to $γ^{3}$ ).
3D: Since $Tr (γ^{μ}) = 0$ and $γ^{2} = \pm I$ , the eigenvalues must come in pairs of $+ 1$ and $- 1$ . This implies the dimension $n$ of the matrix must be even.
4D: This is the first even dimension that can accommodate more than 3 anti-commuting matrices. In 4D space, we can construct up to 5 mutually anti-commuting matrices ( $γ^{0}, γ^{1}, γ^{2}, γ^{3}$ , and $γ^{5}$ ).

In 4D, the most common choice is the Dirac Representation (Standard Representation), which uses the $2 \times 2$ identity matrix $I$ and Pauli matrices $σ_{i}$ :

\begin{aligned} γ^{0} & = (\begin{array}{c} I & 0 \\ 0 & - I \end{array}), γ^{i} = (\begin{array}{c} 0 & σ_{i} \\ - σ_{i} & 0 \end{array}), \\ γ^{i} & = β α_{i} = (\begin{array}{c} I & 0 \\ 0 & - I \end{array}) (\begin{array}{c} 0 & σ_{i} \\ σ_{i} & 0 \end{array}) = (\begin{array}{c} 0 & σ_{i} \\ - σ_{i} & 0 \end{array}) \\ σ_{x} & = (\begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}), σ_{y} = (\begin{array}{cc} 0 & - i \\ + i & 0 \end{array}), σ_{z} = (\begin{array}{ll} 1 & 0 \\ 0 & - 1 \end{array}) \end{aligned}

This is a profound result: the requirement for a linear, relativistic wave equation forces the wave function $ψ$ to have four components, which we now understand as describing the electron (spin up/down) and its antiparticle, the positron (spin up/down). Dirac attempted to solve the negative probability problem but inadvertently unlocked the door to spin. This demonstrates that magnetism is not an added property of matter, but an inevitable consequence of spacetime symmetry.

4. Coupling with Electromagnetic Field

Now we consider the case with an electromagnetic potential $A^{μ} = (Φ / c, \vec{A})$ . We need to perform a local gauge transformation in the Dirac equation. Momentum substitution: Replace momentum $\vec{p}$ with the Kinetic Momentum: $\vec{p} \to \vec{p} - q \vec{A}$ , where $\vec{A}$ is the magnetic vector potential. Energy substitution: Replace the energy operator $i ℏ \frac{\partial}{\partial t}$ with: $i ℏ \frac{\partial}{\partial t} \to i ℏ \frac{\partial}{\partial t} - q Φ$ , where $Φ$ is the electric scalar potential. The equation becomes:

i ℏ \frac{\partial ψ}{\partial t} = [c \vec{α} \cdot (\vec{p} - q \vec{A}) + β m c^{2} + q Φ] ψ

To handle 4D spacetime more intuitively, we usually write this in covariant form. Introducing the Gamma matrix definitions $γ^{0} = β, γ^{i} = β α_{i}$ , and combining the 4-momentum operator $p_{μ} = i ℏ \partial_{μ} = (i ℏ \frac{1}{c} \partial_{t}, - i ℏ \nabla)$ with the 4-potential $A_{μ} = (Φ / c, - \vec{A})$ , the equation can be rearranged as:

γ^{0} (i ℏ \frac{1}{c} \partial_{t} - \frac{q Φ}{c}) ψ + \vec{γ} \cdot (i ℏ \nabla + q \vec{A}) ψ - m c ψ = 0

Using the covariant derivative $D_{μ} = \partial_{μ} + \frac{i q}{ℏ} A_{μ}$ , it can be written in the minimalist form:

(i ℏ γ^{μ} D_{μ} - m c) ψ = 0 or γ^{μ} (p_{μ} - q A_{μ}) ψ = m c ψ

We now wish to solve this four-component spinor equation. The simple block form of the gamma matrices allows us to group the four components into pairs. Furthermore, in the non-relativistic limit, the rest energy $m c^{2}$ is the dominant term. We write the wavefunction as a combination of a "large component" $ϕ$ and a "small component" $χ$ (both are 2-component spinors), explicitly extracting the time evolution of the rest energy:

ψ (\vec{r}, t) = (\begin{matrix} ϕ (\vec{r}, t) \\ χ (\vec{r}, t) \end{matrix}) e^{- i m c^{2} t / ℏ}

Applying the product rule to the time derivative on the left side:

i ℏ \frac{\partial ψ}{\partial t} = (i ℏ \frac{\partial}{\partial t} (\begin{matrix} ϕ \\ χ \end{matrix}) + m c^{2} (\begin{matrix} ϕ \\ χ \end{matrix})) e^{- i m c^{2} t / ℏ}

Substituting this into the Dirac equation and using the matrix form in the standard Dirac representation, we cancel the exponential term on both sides to obtain:

i ℏ \frac{\partial}{\partial t} (\begin{matrix} ϕ \\ χ \end{matrix}) + m c^{2} (\begin{matrix} ϕ \\ χ \end{matrix}) = c (\begin{matrix} 0 & \vec{σ} \\ \vec{σ} & 0 \end{matrix}) \cdot \vec{π} (\begin{matrix} ϕ \\ χ \end{matrix}) + (\begin{matrix} m c^{2} & 0 \\ 0 & - m c^{2} \end{matrix}) (\begin{matrix} ϕ \\ χ \end{matrix}) + q Φ (\begin{matrix} ϕ \\ χ \end{matrix})

where $\vec{π} = \vec{p} - q \vec{A}$ is the kinetic momentum. We decompose the above equation into two coupled equations:

\begin{aligned} i ℏ \frac{\partial ϕ}{\partial t} + m c^{2} ϕ = c (\vec{σ} \cdot \vec{π}) χ + m c^{2} ϕ + q Φ ϕ \\ i ℏ \frac{\partial χ}{\partial t} + m c^{2} χ = c (\vec{σ} \cdot \vec{π}) ϕ - m c^{2} χ + q Φ χ \end{aligned}

In the non-relativistic limit, the electron's kinetic and potential energies are much smaller than its rest energy, i.e., $| i ℏ \partial_{t} χ | ≪ | 2 m c^{2} χ |$ and $| q Φ χ | ≪ | 2 m c^{2} χ |$ . Thus, the equation for the small component approximates to:

2 m c^{2} χ \approx c (\vec{σ} \cdot \vec{π}) ϕ ⟹ χ \approx \frac{\vec{σ} \cdot \vec{π}}{2 m c} ϕ

This explains why $χ$ is called the "small component": its magnitude is roughly $v / c$ times that of the large component. Substituting the expression for $χ$ back into the equation for the large component $ϕ$ :

i ℏ \frac{\partial ϕ}{\partial t} = [\frac{(\vec{σ} \cdot \vec{π})^{2}}{2 m} + q Φ] ϕ

Here lies the key to the emergence of spin. Using the Pauli matrix identity $(\vec{σ} \cdot \vec{A}) (\vec{σ} \cdot \vec{B}) = \vec{A} \cdot \vec{B} + i \vec{σ} \cdot (\vec{A} \times \vec{B})$ :

(\vec{σ} \cdot \vec{π})^{2} = \vec{π} \cdot \vec{π} + i \vec{σ} \cdot (\vec{π} \times \vec{π})

We calculate the operator cross product $\vec{π} \times \vec{π}$ acting on a test function $f$ :

(\vec{π} \times \vec{π}) f = (\vec{p} - q \vec{A}) \times (\vec{p} - q \vec{A}) f = (\vec{p} \times \vec{p} - q \vec{p} \times \vec{A} - q \vec{A} \times \vec{p} + q^{2} \vec{A} \times \vec{A}) f

Since $\vec{p} \times \vec{p} = 0$ and $\vec{A} \times \vec{A} = 0$ , the remaining terms are:

\begin{aligned} - q (\vec{p} \times \vec{A} & + \vec{A} \times \vec{p}) f = - q [(- i ℏ \nabla) \times (\vec{A} f) + \vec{A} \times (- i ℏ \nabla f)] \\ = i ℏ q [(\nabla \times \vec{A}) f - \vec{A} \times (\nabla f) + \vec{A} \times (\nabla f)] = i ℏ q (\nabla \times \vec{A}) f = i ℏ q \vec{B} f \end{aligned}

Therefore:

(\vec{σ} \cdot \vec{π})^{2} = {\vec{π}}^{2} + i \vec{σ} \cdot (i ℏ q \vec{B}) = {\vec{π}}^{2} - ℏ q \vec{σ} \cdot \vec{B}

Finally, we have fully derived the Pauli Equation, which explicitly includes the Zeeman term $U = - \vec{μ} \cdot \vec{B}$ :

i ℏ \frac{\partial ϕ}{\partial t} = [\frac{(\vec{p} - q \vec{A})^{2}}{2 m} + q Φ - \frac{q ℏ}{2 m} \vec{σ} \cdot \vec{B}] ϕ

Clearly, an extra term has appeared in the Hamiltonian:

H = \frac{(\vec{p} - q \vec{A})^{2}}{2 m} + q Φ - \frac{q ℏ}{2 m} \vec{σ} \cdot \vec{B}

This is the origin of spin. If we hadn't introduced the $\vec{σ}$ matrices (i.e., the Spin-0 Klein-Gordon case), the kinetic operator term would simply be ${\vec{π}}^{2}$ . This term would not produce a linear coupling with the magnetic field $\vec{B}$ , degenerating instead into the standard electromagnetic coupling Hamiltonian for a spinless particle: $H_{S p i n - 0} = \frac{(\vec{p} - q \vec{A})^{2}}{2 m} + q Φ$ . It is precisely because the relativistic covariance of the electron wavefunction requires a matrix structure that the non-commutativity of $\vec{π}$ is transformed into a magnetic energy term. In simple terms: introducing the $\vec{σ}$ matrix mathematically adds an "internal axis of rotation" to the particle. Without it, the particle is a completely "isotropic" point in space. The $\vec{σ}$ matrix defines the non-commutativity of operators; in a Spin-0 world, the three components of momentum $π_{x}, π_{y}, π_{z}$ do not commute in a magnetic field, but their effect is "external" (changing the particle's trajectory). Introducing the $\vec{σ}$ matrix directly couples this non-commutativity (the magnetic field strength $\vec{B}$ ) to the internal dimensions of the wavefunction.

5. Universal Covering between SU2 and SO3

We have already observed the "emergence" of spin from the dynamical level. However, as an intrinsic property, spin should theoretically not require an external electromagnetic field to manifest; fundamentally, its existence must be linked to symmetry and transformation. Why must the wavefunction be operated on by $\vec{σ}$ matrices? Why is it a two-component spinor? This requires us to delve into the kinematic underpinnings—specifically, the group-theoretical basis of spacetime symmetry. Even without considering electromagnetic fields, as a physical object, when we rotate the laboratory coordinate system by an angle $θ$ , the electron's wavefunction $ψ$ must inevitably change. It is precisely this "transformation rule under rotation" that defines spin. Before diving into the physics, let us briefly introduce the most important groups: SU(2) and SO(3). These are Lie groups and hold an extremely important place in physics.

The basic definition of SU(2) is:

\begin{aligned} SU (2) & \equiv {U | U \in GL (2, C), U^{†} U = 1_{2 \times 2}, | U | = 1} \\ \equiv {[\begin{array}{cc} a & b \\ - b^{*} & a^{*} \end{array}] | a, b \in C, | a |^{2} + | b |^{2} = 1} \\ \equiv {U (\vec{n}, ω) = e^{i \frac{ω}{2} \vec{n} \cdot \vec{σ}} | ω \in [0, π], \vec{n} is the set of all 3D real unit vectors} \end{aligned}

If we use real parameters $x_{i} \in R$ and let $a = x_{4} + i x_{3}, b = x_{2} + i x_{1}$ to describe it:

\begin{array}{r} U = [\begin{array}{cc} a & b \\ - b^{*} & a^{*} \end{array}] = [\begin{array}{cc} x_{4} + i x_{3} & x_{2} + i x_{1} \\ - x_{2} + i x_{1} & x_{4} - i x_{3} \end{array}] \end{array}

The constraint becomes $x_{1}^{2} + x_{2}^{2} + x_{3}^{2} + x_{4}^{2} = 1$ , indicating that SU(2) as a manifold is $S^{3}$ , a 3-sphere (hypersphere). Its $T^{2}$ -fibration is described as:

{\begin{cases} x_{1} = \sin θ \cos φ \\ x_{2} = \sin θ \sin φ \end{cases}, {\begin{cases} x_{3} = \cos θ \cos χ \\ x_{4} = \cos θ \sin χ \end{cases}, where θ \in [0, π / 2]; φ, χ \in [0, 2 π] .

This can be described using two opposing conical surfaces for $χ, φ$ and an axis $θ$ . At $θ = 0$ or $θ = π / 2$ , one parameter becomes degenerate. It can also be described as a "doughnut" (solid torus) that scales with $θ$ , causing a parameter to degenerate at the endpoints. When $θ = 0$ , the doughnut is a circle with zero width; only $χ$ is valid along the circle, while $φ$ is invalid because the circle has no width. When $θ = π / 2$ , the doughnut becomes a sphere with no hole, so $χ$ becomes invalid and only $φ$ is effective.

The spherical coordinate system description is:

For ω \in [0, 2 π], θ \in [0, π], φ \in [0, 2 π] we have {\begin{cases} x_{1} = \sin \frac{ω}{2} \sin θ \cos φ \\ x_{2} = \sin \frac{ω}{2} \sin θ \sin φ \\ x_{3} = \sin \frac{ω}{2} \cos θ \\ x_{4} = \cos \frac{ω}{2} \end{cases}

SU(2) can also be expressed by Pauli matrices:

U = [\begin{array}{cc} x_{4} + i x_{3} & x_{2} + i x_{1} \\ - x_{2} + i x_{1} & x_{4} - i x_{3} \end{array}] = x_{4} 1_{2 \times 2} + i x_{1} σ_{1} + i x_{2} σ_{2} + i x_{3} σ_{3} .

Combining this further with spherical coordinates, we have:

\begin{aligned} U (\vec{n}, ω) & = e^{i \frac{ω}{2} \vec{n} \cdot \vec{σ}} = 1_{2 \times 2} \cos \frac{ω}{2} + i n^{a} σ_{a} \sin \frac{ω}{2} \\ \vec{n} & = (\sin θ \cos φ, \sin θ \sin φ, \cos θ); ω \in [0, 2 π], θ \in [0, π], φ \in [0, 2 π] . \end{aligned}

The basic definition of SO(3) is:

\begin{aligned} SO (3) & \equiv {R | R \in GL (3, R), R^{T} R = 1_{3 \times 3}, | R | = 1} \\ \equiv {\begin{cases} R (\vec{ω}) & \begin{matrix} \vec{ω} = ω \vec{n}, \vec{n} = (\cos φ \sin θ, \sin φ \sin θ, \cos θ) \\ ω \in [0, π], θ \in [0, π], φ \in [0, 2 π] \end{matrix} \end{cases}} . \end{aligned}

SO(3) is more intuitive; it represents the rotation operations we can see and touch. As a manifold, SO(3) can be viewed as a solid ball of radius $π$ formed by the endpoints of $\vec{ω}$ with antipodal identification. Its abstraction lies in this antipodal identification. Although its origin is the simple fact that rotating $180^{\circ}$ counter-clockwise around a fixed axis is the same as rotating $180^{\circ}$ clockwise, this antipodal identification results in the solid ball being a connected manifold but not a simply connected one (i.e., not every closed curve or loop in the space can be continuously shrunk to a point). The solid ball with antipodal identification has a name: Real Projective Space of 3 dimensions, denoted as $R P^{3}$ .

Mathematically, a representation of a group $G$ on a vector space $V$ is a homomorphism from the group $G$ to the general linear group $G L (V)$ (the group of all invertible transformations on $V$ ): $\forall g_{1}, g_{2} \in G, D (g_{1} g_{2}) = D (g_{1}) D (g_{2})$ . For Lie groups, the map is also required to be continuous. Projective representations arise because, in quantum mechanics, physical states are described by rays in Hilbert space (i.e., $| ψ ⟩$ and $e^{i α} | ψ ⟩$ represent the same physical state). Therefore, the group multiplication law only needs to hold up to a phase factor:

D (g_{1}) D (g_{2}) = ω (g_{1}, g_{2}) D (g_{1} g_{2})

where $ω (g_{1}, g_{2})$ is a complex number with modulus 1, called the group exponent. Bargmann's Theorem (1954) provides a rigorous mathematical framework for projective representations: for a Lie group $G$ satisfying $H^{2} (g, R) = 0$ (including SO(3) and the Lorentz group), all continuous projective unitary representations can be "lifted" to ordinary unitary representations of a central extension group $\tilde{G}$ . For SO(3), where the "phase ambiguity" cannot be removed simply, we solve this by finding its Universal Covering Group, SU(2), and converting the projective representation of SO(3) into an ordinary representation of SU(2).

The concept of the universal covering group introduced here highlights the profound connection between SO(3) and SU(2). In topology, the universal covering space $\tilde{X}$ of a space $X$ is like an "upgraded version" of it, possessing two core features: Simply Connectedness (all loops in $\tilde{X}$ can be shrunk to a point; no topological holes) and Local Isomorphism (in local regions, $\tilde{X}$ looks exactly like $X$ , but globally $\tilde{X}$ is often "larger" and covers $X$ in an $n : 1$ manner).

Why do SU(2) matrices generate SO(3) rotations? Here is a classic construction method. We map a vector $x = (x, y, z)$ in 3D space to a second-order traceless Hermitian matrix $X$ :

X = x σ_{1} + y σ_{2} + z σ_{3} = (\begin{matrix} z & x - i y \\ x + i y & - z \end{matrix})

(where $σ_{i}$ are the Pauli matrices). Note that $det (X) = - (x^{2} + y^{2} + z^{2}) = - ∥ x ∥^{2}$ . Let a matrix $U$ in SU(2) act on $X$ via the following transformation: $X^{'} = U X U^{†}$ . Since $U$ is unitary and has a determinant of 1, this transformation preserves the tracelessness and Hermiticity of $X$ , and preserves the determinant: $det (X^{'}) = det (U X U^{†}) = det (X)$ . This means $∥ x^{'} ∥^{2} = ∥ x ∥^{2}$ , i.e., the transformation preserves the length of the vector, so it describes a 3D rotation. Observing the transformation formula $X^{'} = U X U^{†}$ , if we replace $U$ with $- U$ : $(- U) X (- U)^{†} = (- 1)^{2} U X U^{†} = U X U^{†}$ , we find that $U$ and $- U$ produce exactly the same rotation effect. This is the algebraic root of the 2:1 covering: every rotation in SO(3) corresponds to two points in SU(2). This explains why rotating by $2 π$ does not return to the origin (in SU(2), it has only traveled half a circle), while rotating by $4 π$ is required to return to the origin (completing a full circle in SU(2)). That is to say, SO(3) itself has a "hole" (its fundamental group is $Z_{2}$ ), while SU(2) (i.e., $S^{3}$ ) is simply connected (fundamental group is 0) and has no topological holes.

The second condition of the universal cover, local consistency, leads to another major feature of SU(2) and SO(3): they are locally isomorphic near the identity element. This means that if you look only at "infinitesimal" rotations, or rotate only a tiny bit, the two groups are exactly the same. It is only when you rotate a large amount (e.g., $2 π$ ) to explore the "full picture" of the group that you discover they are different (one returns to the start, the other goes to $- I$ ). Mathematically speaking, this is because they possess the exact same Lie Algebra, i.e., an isomorphism of the tangent spaces at the identity: $su (2) ≅ so (3)$ . From solving the Schrödinger equation, we know that the 3 generators of SO(3) are $J_{x}, J_{y}, J_{z}$ (corresponding to infinitesimal rotations about the x, y, z axes), and their commutation relations are: $[J_{i}, J_{j}] = i ϵ_{i j k} J_{k}$ . This formula defines the essence of 3D rotation; the so-called generators form a basis for the Lie algebra.

For the SU(2) group, let us now solve for its generators. Assume an infinitesimal transformation:

U (ϵ) = I - i ϵ S

For this to belong to SU(2), there is a unitarity constraint: $(I + i ϵ S^{†}) (I - i ϵ S) = I ⟹ I - i ϵ (S - S^{†}) = I ⟹ S = S^{†}$ . Thus, $S$ must be a Hermitian matrix, which is physically observable. There is also a special constraint: using the formula $det (e^{A}) = e^{Tr (A)}$ , we have $det (U) = det (e^{- i ϵ S}) = e^{- i ϵ Tr (S)} = 1 ⟹ Tr (S) = 0$ . So $S$ must be a traceless matrix. The matrices that fit these conditions are exactly the Pauli matrices $σ_{x}, σ_{y}, σ_{z}$ , which form a complete basis for the Lie algebra $su (2)$ . Thus, the generator $S$ must be proportional to $σ$ . Since $[σ_{i}, σ_{j}] = 2 i ϵ_{i j k} σ_{k}$ , we find this differs from $[L_{i}, L_{j}] = i ϵ_{i j k} L_{k}$ by only a factor of $\frac{1}{2}$ . This already demonstrates that the Lie algebras of SU(2) and SO(3) are isomorphic. If we take $S = \frac{1}{2} σ$ , it becomes the standard $[S_{i}, S_{j}] = i ϵ_{i j k} S_{k}$ , exactly the same commutation relations as $J$ . This gives us a self-consistent theory of angular momentum, where in the physical world, total angular momentum $L = J + S$ exists. If spin $S$ is to qualify as "angular momentum" and be additive with $J$ to form a conserved quantity, $S$ must follow the exact same algebraic rules as $J$ . Since orbital angular momentum $J = r \times p$ is defined by spatial coordinates and its commutation relations are fixed (derived from the commutation of $x$ and $p$ ), we have no choice but to let $S = \frac{1}{2} σ$ . Beyond theoretical consistency, real-world experimental results confirm this: when we perform the Stern-Gerlach experiment, measuring the deflection of electrons in a magnetic field, the measured physical values are $\pm \frac{1}{2} ℏ$ . This directly proves that the operator $S$ representing the physical observable must have eigenvalues of $\pm 1 / 2$ . Mathematically, only the matrix $\frac{1}{2} σ_{z}$ has eigenvalues of $\pm 1 / 2$ (since the eigenvalues of $σ_{z}$ are $\pm 1$ ).

Returning to representations: for Lie groups, a useful property of generators is that any finite transformation $D (θ)$ can be obtained from the generator $J$ via the exponential map. If $J$ is an element of the Lie algebra $g$ , then elements of the group can be written as:

D (θ) = \exp (- i θ n \cdot J)

Here $J$ is the angular momentum operator (matrix) we discussed. Note that this expression involves a common "abuse of notation" or shorthand in physics. Strictly speaking, the exponential map $\exp$ maps abstract Lie algebra elements to abstract Lie group elements. However, the physical formula $D (θ) = \exp (- i θ n \cdot J)$ is actually an operation within the representation space (matrix space). Since $J$ here is the matrix representation of the generator, the result $D (θ)$ is naturally the matrix representation of the group element.

Whether we are dealing with SO(3) or SU(2), their Lie algebras are isomorphic. This means they share the same set of generator commutation relations: $[J_{i}, J_{j}] = i ϵ_{i j k} J_{k}$ (taking $ℏ = 1$ ). We want to find all finite-dimensional irreducible representations allowed by this set of algebraic rules. Define the ladder operators:

J_{\pm} = J_{x} \pm i J_{y}

Introduce the eigenstates $| j, m ⟩$ of $J_{z}$ such that:

J_{z} | j, m ⟩ = m | j, m ⟩, J^{2} | j, m ⟩ = λ | j, m ⟩

Calculating the commutator:

[J_{z}, J_{\pm}] = [J_{z}, J_{x}] \pm i [J_{z}, J_{y}] = i J_{y} \pm i (- i J_{x}) = \pm (J_{x} \pm i J_{y}) = \pm J_{\pm}

This implies $J_{\pm}$ act as "ladders" for the eigenvalues:

J_{z} (J_{\pm} | j, m ⟩) = (J_{\pm} J_{z} + [J_{z}, J_{\pm}]) | j, m ⟩ = (m \pm 1) (J_{\pm} | j, m ⟩)

If $m$ is an eigenvalue, then $m \pm 1$ are also eigenvalues. However, since we are looking for finite-dimensional representations, the spectrum of eigenvalues must have an upper bound $m_{m a x}$ and a lower bound $m_{m i n}$ .

J_{+} | j, m_{m a x} ⟩ = 0, J_{-} | j, m_{m i n} ⟩ = 0

Using the operator identity $J_{-} J_{+} = J^{2} - J_{z}^{2} - J_{z}$ acting on the highest weight state $| j, m_{m a x} ⟩$ :

0 = (λ - m_{m a x}^{2} - m_{m a x}) | m_{m a x} ⟩ ⟹ λ = m_{m a x} (m_{m a x} + 1)

For convenience, we label the maximum weight as $j$ , i.e., $m_{m a x} \equiv j$ . So the eigenvalue of the Casimir operator is $j (j + 1)$ . Similarly, using $J_{+} J_{-} = J^{2} - J_{z}^{2} + J_{z}$ acting on the lowest weight state $| j, m_{m i n} ⟩$ :

0 = (j (j + 1) - m_{m i n}^{2} + m_{m i n}) | m_{m i n} ⟩

Solving the equation $m_{m i n}^{2} - m_{m i n} - j (j + 1) = 0$ , we get two solutions:

m_{m i n} = - j or m_{m i n} = j + 1

Since $m_{m i n} \leq m_{m a x} = j$ , we must have $m_{m i n} = - j$ . Climbing from $m_{m i n} = - j$ to $m_{m a x} = j$ by adding 1 each step, we must reach the top in an integer number of steps $k$ :

m_{m a x} - m_{m i n} = j - (- j) = 2 j = k (k \in Z) ⟹ j = \frac{k}{2}

We arrive at the conclusion that, based solely on the Lie algebra structure, the allowed values for $j$ are $0, 1 / 2, 1, 3 / 2, 2 \dots$ . However, the Lie algebra structure is only a local property. We must now test these results against the global group structure. The core criterion for this test is single-valuedness: If we transform a group element along a closed path back to the starting point (the identity), its representation matrix must also return to the identity matrix (for an ordinary representation). SO(3) is the rotation group in 3D space. Rotating by $2 π$ ( $360^{\circ}$ ) around any axis (say, the $z$ -axis) restores physical space completely: $R_{z} (2 π) = R_{z} (0) = 1$ , which is the group identity. For an ordinary representation $D$ of SO(3), it must satisfy: $D (R_{z} (2 π)) = D (1) = I$ (the identity matrix). Substituting the formula derived from the Lie algebra, where $J_{z}$ is diagonal in the $z$ -basis with diagonal elements $m$ : $D (2 π) = \exp (- i 2 π J_{z}) = diag (e^{- i 2 π m}, \dots)$ . To make this matrix equal to the identity $I$ , every diagonal element must be 1: $e^{- i 2 π m} = 1 ⟹ m \in Z$ . If $j$ is an integer ( $0, 1, \dots$ ), then $m$ is an integer, and the condition is met. However, if $j$ is a half-integer ( $1 / 2, 3 / 2, \dots$ ), then $m$ is a half-integer, and $e^{- i 2 π m} = - 1 \neq 1$ . Therefore, half-integer spins are strictly forbidden in ordinary representations of SO(3).

However, the geometric structure of SU(2) is different from SO(3). It is the Universal Covering Group of SO(3) (a 2:1 cover). In SU(2), the parameter $θ = 2 π$ corresponds not to the identity element, but to $U (2 π) = - I \neq I$ ; only a rotation of $4 π$ corresponds to the identity element. We find that the behavior of $D (2 π)$ perfectly matches the behavior of the SU(2) group itself. We have effectively obtained an instance of Bargmann's Theorem on SO(3) and SU(2): the projective representations of the non-simply connected Lie group SO(3) are equivalent to the ordinary representations of its universal covering group SU(2). The final mapping relationship is:

Spin j	In Lie Algebra	In SU(2)	In SO(3)	Physical Particle
Integer ( $0, 1, \dots$ )	Exists	Ordinary Rep. (but not faithful, cannot distinguish $\pm I$ )	Ordinary Rep.	Bosons (Photons, etc.)
Half-Integer ( $1 / 2, \dots$ )	Exists	Ordinary Rep. (Faithful Rep.)	Projective Rep. (Multi-valued)	Fermions (Electrons, etc.)

"Spin" is able to "emerge" from this abstract mathematical structure because quantum mechanics defines "physical state" more leniently than classical mechanics, thereby releasing topological degrees of freedom that were masked by classical physics.

Symmetry (Root): The universe possesses rotational symmetry, leading to the existence of the Lie algebra $so (3) ≅ su (2)$ .
Quantization (Opportunity): The nature of probability waves allows for "projective representations," enabling the "half-integer parts" forbidden by classical physics in the Lie algebra to survive.
Intrinsic Nature (Formation): These surviving half-integer representations cannot correspond to any spatial motion, and thus can only be interpreted as the particle's innate intrinsic angular momentum.

Now let us specifically calculate what the representations are for different $j$ . We use the exponential map:

D^{(j)} (\hat{n}, θ) = \sum_{k = 0}^{\infty} \frac{(- i θ)^{k}}{k!} (\hat{n} \cdot J^{(j)})^{k}

where $\hat{n}$ is the rotation axis unit vector and $θ$ is the rotation angle.

Simplest case: $j = 0$ scalar representation. Dimension $d = 2 (0) + 1 = 1$ . The basis has only one state $| 0, 0 ⟩$ . Since $m$ can only be 0, the generator $J_{z} = [0]$ . The ladder operators acting on the highest/lowest weight states are 0, so $J_{+} = [0], J_{-} = [0]$ , and thus $J_{x} = 0, J_{y} = 0, J_{z} = 0$ . The exponential map gives the representation $D (θ) = e^{- i θ n \cdot 0} = 1$ , also known as the trivial representation. This is a scalar; no matter how you rotate, the value is always multiplied by 1, remaining unchanged.

When $j = \frac{1}{2}$ , the dimension is $2 j + 1 = 2$ . We already know the generator is $J = \frac{1}{2} σ ⟹ \hat{n} \cdot J = \frac{1}{2} (\hat{n} \cdot σ)$ . To calculate higher powers of $(\hat{n} \cdot J)$ , recall the property of Pauli matrices: $(\hat{n} \cdot \vec{σ})^{2} = I$ (Identity matrix). Thus, the power law for generators is:

(\hat{n} \cdot J)^{2} = {(\frac{1}{2} \hat{n} \cdot \vec{σ})}^{2} = \frac{1}{4} (\hat{n} \cdot \vec{σ})^{2} = \frac{1}{4} I, (\hat{n} \cdot J)^{3} = (\hat{n} \cdot J)^{2} (\hat{n} \cdot J) = \frac{1}{4} (\hat{n} \cdot J)

The general term formula is:

\begin{aligned} (\hat{n} \cdot J)^{2 k} & = (\frac{1}{4})^{k} I = (\frac{1}{2})^{2 k} I \\ (\hat{n} \cdot J)^{2 k + 1} & = (\frac{1}{2})^{2 k} (\hat{n} \cdot J) = (\frac{1}{2})^{2 k + 1} (\hat{n} \cdot \vec{σ}) \end{aligned}

Summing the series by splitting the exponential series into even and odd parts:

\begin{aligned} D^{(1 / 2)} & = \sum_{k = 0}^{\infty} \frac{(- i θ)^{k}}{k!} (\hat{n} \cdot J)^{k} \\ = \underset{Even terms}{\underset{⏟}{\sum_{m = 0}^{\infty} \frac{(- i θ)^{2 m}}{(2 m)!} {(\frac{1}{2})}^{2 m} I}} + \underset{Odd terms}{\underset{⏟}{\sum_{m = 0}^{\infty} \frac{(- i θ)^{2 m + 1}}{(2 m + 1)!} {(\frac{1}{2})}^{2 m + 1} (\hat{n} \cdot \vec{σ})}} \end{aligned}

Even term coefficient: $\sum \frac{(- 1)^{m}}{(2 m)!} (\frac{θ}{2})^{2 m} = \cos (\frac{θ}{2})$ . Odd term coefficient: $- i \sum \frac{(- 1)^{m}}{(2 m + 1)!} (\frac{θ}{2})^{2 m + 1} = - i \sin (\frac{θ}{2})$ . Finally, we get:

D^{(1 / 2)} (\hat{n}, θ) = \cos (\frac{θ}{2}) I - i \sin (\frac{θ}{2}) (\hat{n} \cdot \vec{σ})

This is the representation for $j = 1 / 2$ . It maps rotation operations to $2 \times 2$ complex matrices. Checking $2 π$ : Substitute $θ = 2 π$ , $\cos (π) = - 1, \sin (π) = 0$ . The result is $- I$ .

When $j = 1$ , the dimension is $2 j + 1 = 3$ . We need $3 \times 3$ matrices. In the angular momentum basis defined in physics (Cartesian basis), the generators satisfy $(J_{k})_{a b} = - i ϵ_{k a b}$ . For example, the generator for rotation around the $z$ -axis, $J_{z}$ :

J_{z} = (\begin{matrix} 0 & - i & 0 \\ i & 0 & 0 \\ 0 & 0 & 0 \end{matrix})

For an arbitrary axis $\hat{n}$ , let matrix $K = \hat{n} \cdot J$ . Calculating powers of $J_{z}$ directly (other directions are similar):

J_{z}^{2} = (\begin{matrix} 0 & - i & 0 \\ i & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) (\begin{matrix} 0 & - i & 0 \\ i & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) = (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}) (Note: This is not I)

J_{z}^{3} = J_{z}^{2} \cdot J_{z} = (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}) (\begin{matrix} 0 & - i & 0 \\ i & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) = (\begin{matrix} 0 & - i & 0 \\ i & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) = J_{z}

We find a pattern: for $j = 1$ generators, they satisfy the characteristic equation $(\hat{n} \cdot J)^{3} = (\hat{n} \cdot J)$ . This means: Odd terms ( $k = 1, 3, 5 \dots$ ): $(\hat{n} \cdot J)^{k} = (\hat{n} \cdot J)$ ; Even terms ( $k = 2, 4, 6 \dots$ ): $(\hat{n} \cdot J)^{k} = (\hat{n} \cdot J)^{2}$ ; $k = 0$ term: $I$ (Identity matrix). Expanding the Taylor series again, but this time we must separate $I$ because $J^{2} \neq I$ .

D^{(1)} = I + \sum_{odd k} \frac{(- i θ)^{k}}{k!} (\hat{n} \cdot J) + \sum_{even k \geq 2} \frac{(- i θ)^{k}}{k!} (\hat{n} \cdot J)^{2}

Odd term coefficient: $- i (θ - \frac{θ^{3}}{3!} + \dots) = - i \sin θ$ . Even term coefficient: $(\frac{- θ^{2}}{2!} + \frac{θ^{4}}{4!} - \dots) = \cos θ - 1$ . Result:

D^{(1)} (\hat{n}, θ) = I - i \sin θ (\hat{n} \cdot J) + (\cos θ - 1) (\hat{n} \cdot J)^{2}

This is the representation for $j = 1$ (Rodrigues' rotation formula in physics form). It maps rotation operations to $3 \times 3$ real matrices (although $J$ contains $i$ , $i \cdot J$ is a real matrix). Checking $2 π$ : Substitute $θ = 2 π$ , $\sin (2 π) = 0, \cos (2 π) = 1$ , $D^{(1)} = I - 0 + (1 - 1) (\dots) = I$ .

Generally, we can establish generators and representations for all $j$ . The construction of generators for all $j$ relies on three core matrix element formulas for angular momentum operators in quantum mechanics. As long as we have these three formulas, we can write out matrices for $j = 0, 3 / 2, 2$ or even $j = 100$ . First, $J_{z}$ is a diagonal matrix:

⟨ j, m^{'} | J_{z} | j, m ⟩ = m δ_{m^{'} m}

Then $J_{+}$ (raising operator) is a superdiagonal matrix:

⟨ j, m + 1 | J_{+} | j, m ⟩ = \sqrt{j (j + 1) - m (m + 1)}

$J_{-}$ (lowering operator) is a subdiagonal matrix: It is the transpose of $J_{+}$ (in the real case). And $J_{x}$ and $J_{y}$ are composed of $J_{\pm}$ :

J_{x} = \frac{1}{2} (J_{+} + J_{-}), J_{y} = \frac{1}{2 i} (J_{+} - J_{-})

For example $j = 3 / 2$ : Spin-3/2 representation. This belongs to fermions, similar to electrons, but has 4 components. Commonly seen in $Δ$ baryons or gravitinos in supergravity. Dimension: $d = 2 (3 / 2) + 1 = 4$ . Construct generator $J_{z}$ (diagonal):

J_{z} = (\begin{matrix} 3 / 2 & 0 & 0 & 0 \\ 0 & 1 / 2 & 0 & 0 \\ 0 & 0 & - 1 / 2 & 0 \\ 0 & 0 & 0 & - 3 / 2 \end{matrix})

$J_{+}$ (raising operator coefficients) requires calculating $\sqrt{j (j + 1) - m (m + 1)}$ , where $j = 3 / 2$ , which is $\sqrt{3.75 - m (m + 1)}$ . $m = 1 / 2 \to 3 / 2$ : $\sqrt{3.75 - 0.75} = \sqrt{3}$ ; $m = - 1 / 2 \to 1 / 2$ : $\sqrt{3.75 - (- 0.25)} = \sqrt{4} = 2$ ; $m = - 3 / 2 \to - 1 / 2$ : $\sqrt{3.75 - 0.75} = \sqrt{3}$ . So:

J_{+} = (\begin{matrix} 0 & \sqrt{3} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & \sqrt{3} \\ 0 & 0 & 0 & 0 \end{matrix})

Using $J_{x} = \frac{1}{2} (J_{+} + J_{+}^{†})$ :

J_{x} = \frac{1}{2} (\begin{matrix} 0 & \sqrt{3} & 0 & 0 \\ \sqrt{3} & 0 & 2 & 0 \\ 0 & 2 & 0 & \sqrt{3} \\ 0 & 0 & \sqrt{3} & 0 \end{matrix})

Representation: This is a 4×4 unitary matrix. When rotating by 2π, since the diagonal elements are half-integers, it becomes −I4×4. So j=3/2 is also a faithful representation of SU(2) and a projective representation of SO(3).

Thus far, we have completely narrated the essence of spin from the perspective of symmetry and group theory. However, it seems we did not need relativistic corrections as in the previous chapter. Unlike the previous chapter where we started from the Dirac equation and directly derived the 4-component wavefunction, here we started from spatial rotation (SU(2)/SO(3)) and only derived the 2-component spinor (Pauli spinor) for j=1/2. Where did the other two components go? The reason lies again in relativistic effects; the current symmetry analysis only considered spatial rotations and did not consider Lorentz boosts. Only by introducing the Lorentz Group can we explain why the electron must be the direct sum of "left-handed" and "right-handed" SU(2) representations (2+2=4), thereby forming a perfect closed loop back to the structure of the Dirac equation.

6. Lorentz Group

To achieve complete spin, we must begin considering the true symmetry group: the Lorentz Group $S O (1, 3)$ , which includes both rotations and boosts. When we attempt to find the "fundamental representation" of the Lorentz group, something extremely curious happens: the algebraic structure splits apart.

The definition of the Lorentz Group is:

\underset{\dim O (1, 3) = 6}{\underset{⏟}{O (1, 3) \equiv {Λ ∣ Λ \in GL (4, R), g_{μ ν} Λ^{μ}_{ρ} Λ^{ν}_{σ} = g_{ρ σ}}}}, g = diag (1, - 1, - 1, - 1)

Fundamentally, it is the group of linear transformations that preserve the metric of Minkowski spacetime. From the metric-preserving condition $Λ^{T} g Λ = g \to g_{μ ν} Λ^{μ}_{ρ} Λ^{ν}_{σ} = g_{ρ σ}$ , we can derive a constraint on the component $Λ_{0}^{0}$ :

1 = g_{μ ν} Λ_{0}^{μ} Λ_{0}^{ν} = {(Λ_{0}^{0})}^{2} - \sum_{i} {(Λ_{0}^{i})}^{2} \Rightarrow {(Λ_{0}^{0})}^{2} = 1 + \sum_{i} {(Λ_{0}^{i})}^{2} \geq 1

This means that Lorentz transformations must have either $Λ_{0}^{0} \geq 1$ or $Λ_{0}^{0} \leq - 1$ , so the group is already disconnected. This allows us to divide the Lorentz group into two manifolds: $O^{+} (1, 3)$ and $O^{-} (1, 3)$ . The former is called the orthochronous Lorentz group. The latter does not contain the identity element and thus does not form a group; it is called the antichronous branch of the Lorentz group.

From the metric-preserving condition, we can also determine the value of the determinant:

| Λ^{T} g Λ | = | g | \Rightarrow | Λ |^{2} | g | = | g | \Rightarrow | Λ |^{2} = 1, i.e., | Λ | = \pm 1 .

Transformations with $| Λ | = 1$ are denoted as $S O (1, 3)$ and called the proper Lorentz group, while those with $| Λ | = - 1$ are called the improper branch. Combining these two considerations, we can divide the Lorentz group $O (1, 3)$ into four connected manifolds. However, in practice, we only need to study the proper orthochronous branch $S O^{+} (1, 3)$ . This is because the other three branches can be obtained by acting on $S O^{+} (1, 3)$ with two specific Lorentz transformations: time reversal $T = T^{- 1} = diag (- 1, 1, 1, 1)$ and parity inversion $P = P^{- 1} = diag (1, - 1, - 1, - 1)$ . Moreover, real-world reference frame transformations are strictly orthochronous and proper.

We focus on the proper orthochronous Lorentz group $S O^{+} (1, 3)$ . In this connected component, any transformation can be written as an exponential map from the identity. Just as $S O (3)$ has 3 rotation generators, $S O^{+} (1, 3)$ has a total of 6 degrees of freedom (3 rotations + 3 boosts), corresponding to 6 generators. Considering an infinitesimal transformation $Λ \approx I - i ϵ X$ , similar to the process above, we can write out two sets of generators: Rotation generators $\vec{J} = (J_{1}, J_{2}, J_{3})$ , corresponding to spatial rotations (these are the familiar angular momentum operators); and Boost generators $\vec{K} = (K_{1}, K_{2}, K_{3})$ , corresponding to velocity transformations along the $x, y, z$ axes. These 6 generators satisfy the Lorentz Lie algebra $so (1, 3)$ as follows:

Pure rotations are closed ( $S O (3)$ subalgebra):

[J_{i}, J_{j}] = i ϵ_{i j k} J_{k}

Relation between rotations and boosts (Boost operators themselves rotate like vectors): $$[J_i, K_j] = i \epsilon_{ijk} K_k$$Boosts are not closed among themselves (The composition of two boosts in different directions is not just a boost but also produces a rotation, i.e., Thomas precession) (Note: the negative sign here is a characteristic manifestation of the spacetime metric $g = diag (1, - 1, - 1, - 1)$ , distinguishing it from the algebra of $S O (4)$ ):

[K_{i}, K_{j}] = - i ϵ_{i j k} J_{k}

At this point, the algebraic structure still looks coupled ( $J$ and $K$ are intertwined). To find irreducible representations, we introduce a non-unitary basis change (Complexification). Define two new sets of operators ${\vec{N}}^{+}$ and ${\vec{N}}^{-}$ :

{\vec{N}}^{+} = \frac{1}{2} (\vec{J} + i \vec{K}), {\vec{N}}^{-} = \frac{1}{2} (\vec{J} - i \vec{K})

Let's calculate the commutation relations for these new operators. First, look within ${\vec{N}}^{+}$ :

\begin{aligned} = \frac{1}{4} [J_{i} + i K_{i}, J_{j} + i K_{j}] \\ = \frac{1}{4} ([J_{i}, J_{j}] + i [J_{i}, K_{j}] + i [K_{i}, J_{j}] - [K_{i}, K_{j}]) \\ = \frac{1}{4} (i ϵ_{i j k} J_{k} + i (i ϵ_{i j k} K_{k}) + i (- i ϵ_{i j k} K_{k}) - (- i ϵ_{i j k} J_{k})) \\ = \frac{1}{4} (2 i ϵ_{i j k} J_{k} - 2 ϵ_{i j k} K_{k}) \\ = i ϵ_{i j k} \frac{1}{2} (J_{k} + i K_{k}) = i ϵ_{i j k} N_{k}^{+} \end{aligned}

Similarly, we can verify that $[N_{i}^{-}, N_{j}^{-}] = i ϵ_{i j k} N_{k}^{-}$ . The most shocking result lies between ${\vec{N}}^{+}$ and ${\vec{N}}^{-}$ :

[N_{i}^{+}, N_{j}^{-}] = \frac{1}{4} [J_{i} + i K_{i}, J_{j} - i K_{j}] = \dots = 0

This means: upon complexification, the Lie algebra of the Lorentz group splits into the direct sum of two mutually independent $su (2)$ algebras.

so (1, 3)_{C} ≅ su (2)_{L} \oplus su (2)_{R}

This is a massive simplification in group theory. Since we already know the representations of $su (2)$ inside out (labeled by spin $j$ ), irreducible representations of the Lorentz group can be uniquely labeled by a pair of half-integers or integers $(j_{L}, j_{R})$ . According to this decomposition, the most fundamental spinor representation is no longer unique; instead, there are two most basic choices (fundamental representations), corresponding to taking $j = 1 / 2$ for one $su (2)$ and $j = 0$ for the other. This introduces the concept of chirality.

Left-handed Weyl Spinor corresponds to the label $(1 / 2, 0)$ . It behaves as spin $1 / 2$ under ${\vec{N}}^{-}$ and as a scalar under ${\vec{N}}^{+}$ . This is a 2-component complex vector, denoted $ψ_{L}$ . In this representation, ${\vec{N}}^{-} = \frac{1}{2} \vec{σ}$ and ${\vec{N}}^{+} = 0$ . Solving for the physical generators gives:

\vec{J} = {\vec{N}}^{+} + {\vec{N}}^{-} = \frac{1}{2} \vec{σ}, \vec{K} = - i ({\vec{N}}^{+} - {\vec{N}}^{-}) = i \frac{1}{2} \vec{σ}

Right-handed Weyl Spinor corresponds to the label $(0, 1 / 2)$ . It behaves as a scalar under ${\vec{N}}^{-}$ and as spin $1 / 2$ under ${\vec{N}}^{+}$ . This is also a 2-component complex vector, denoted $ψ_{R}$ . In this representation, ${\vec{N}}^{-} = 0$ and ${\vec{N}}^{+} = \frac{1}{2} \vec{σ}$ . The physical generators are:

\vec{J} = \frac{1}{2} \vec{σ}, \vec{K} = - i \frac{1}{2} \vec{σ}

Note the difference in sign for $\vec{K}$ ! This demonstrates that while $ψ_{L}$ and $ψ_{R}$ behave identically under spatial rotation ( $\vec{J}$ ) (both are spin $1 / 2$ ), their transformation properties under Lorentz Boosts ( $\vec{K}$ ) are diametrically opposite.

Since $ψ_{L}$ and $ψ_{R}$ are both 2-component objects, why do we need 4 components? The reason lies in Parity ( $P$ ). The parity transformation $P$ inverts spatial coordinates $\vec{x} \to - \vec{x}$ . $\vec{J}$ is an axial vector ( $\vec{r} \times \vec{p}$ ), so it remains invariant under $P$ : $\vec{J} \to \vec{J}$ . $\vec{K}$ is a polar vector ( $\sim \vec{v}$ ), so it changes sign under $P$ : $\vec{K} \to - \vec{K}$ . Substituting this into the definition of ${\vec{N}}^{\pm}$ , we find that the parity transformation swaps these two algebras:

P : {\vec{N}}^{+} ⟷ {\vec{N}}^{-}

This means parity transforms the left-handed representation $(1 / 2, 0)$ into the right-handed representation $(0, 1 / 2)$ . If we want to describe a particle like an electron that has both spin and mass, and obeys parity conservation (under electromagnetic forces), we cannot simply pick one. We must take their "direct sum" together. Thus, the Dirac spinor $Ψ$ , as a representation of $S O^{+} (1, 3)$ extended by the parity operator, is precisely the direct sum of these two fundamental representations:

Ψ = (\begin{matrix} ψ_{L} \\ ψ_{R} \end{matrix}) \in (\frac{1}{2}, 0) \oplus (0, \frac{1}{2})

This explains why there are 4 components: two components come from the left-handed sector, and two components come from the right-handed sector; they are tightly coupled together by the mass term and parity transformation. The Pauli spinor we saw earlier with $S U (2)$ is merely a silhouette of this relativistic object in the rest frame (or non-relativistic limit).

Incidentally, the Weyl spinors here are deeply connected to the $γ^{5}$ matrix mentioned earlier in the context of gamma matrices. In fact, the Hermitian operator $γ^{5} \equiv i γ^{0} γ^{1} γ^{2} γ^{3}$ is precisely the operator used to "identify" and "define" Weyl spinors. Without $γ^{5}$ , we could not mathematically distinguish what is "Left-handed" and what is "Right-handed". In Dirac theory, $γ^{5}$ is called the Chirality Operator. It has a crucial algebraic property: it anti-commutes with all $γ^{μ}$ ( ${γ^{5}, γ^{μ}} = 0$ ). However, it commutes with the generators of Lorentz transformations $S^{μ ν} = \frac{i}{4} [γ^{μ}, γ^{ν}]$ ( $[γ^{5}, S^{μ ν}] = 0$ ), meaning $γ^{5}$ is a conserved quantity of the Lorentz group representation (for massless particles), and we can use its eigenvalues to classify spinors. A Right-handed Weyl spinor is a state with a $γ^{5}$ eigenvalue of +1 ( $γ^{5} ψ_{R} = + ψ_{R}$ ), and a Left-handed Weyl spinor is a state with a $γ^{5}$ eigenvalue of -1 ( $γ^{5} ψ_{L} = - ψ_{L}$ ). Thus, what is physically referred to as "left-handedness" and "right-handedness" mathematically refers to whether the eigenvalue of $γ^{5}$ is -1 or +1. Since the Dirac spinor $Ψ$ is a mixture (direct sum) of left and right, how do we "sift" the left-handed and right-handed parts out of a mixed $Ψ$ individually? This requires using projection operators based on $γ^{5}$ :

P_{L} = \frac{1 - γ^{5}}{2}, P_{R} = \frac{1 + γ^{5}}{2}

These two operators have the standard properties of projection operators ( $P^{2} = P, P_{L} P_{R} = 0, P_{L} + P_{R} = 1$ ). Their function is to kill off one handedness component and keep only the other:

P_{L} Ψ = \frac{1 - γ^{5}}{2} (ψ_{L} + ψ_{R}) = \frac{1 - (- 1)}{2} ψ_{L} + \frac{1 - 1}{2} ψ_{R} = ψ_{L}, P_{R} Ψ = ψ_{R}

In particle physics calculations (especially weak interactions), you will often see terms like $\frac{1 - γ^{5}}{2}$ ; this is telling you: "This interaction only plays with left-handed spinors; right-handed spinors, please step aside." (This is precisely the mathematical expression of parity non-conservation). To make this relationship immediately clear, we can choose a special set of Gamma matrix forms called the Weyl Representation (or Chiral Representation). Unlike the Dirac representation mentioned earlier, in the Weyl representation, the Gamma matrices are block diagonal, which makes the Dirac spinor explicitly split into upper and lower Weyl spinors $Ψ = (ψ_{L}, ψ_{R})^{T}$ . This is the most commonly used perspective in high-energy physics. In contrast, in low-energy condensed matter physics, we commonly use the Dirac (Standard) representation, where $ψ_{L}$ and $ψ_{R}$ are deeply mixed, better reflecting the non-relativistic approximation of "large components" and "small components".

7. Magnetic Gradient Force

We have completed a long mathematical journey. From the derivation of the Dirac equation to the representation of the Lorentz group, we have established that the electron must possess spin and is a 4-component relativistic object. Now, let us return to the original paradox: if the Lorentz force does no work, then who is doing the work? To answer this question, we need to link microscopic spin to macroscopic force.

By taking the non-relativistic approximation of the Dirac equation, we naturally obtained the Pauli equation. Let us revisit the "Zeeman term" that appeared seemingly out of nowhere:

H_{Z e e m a n} = - \frac{e ℏ}{2 m} (\vec{σ} \cdot \vec{B})

In classical physics, we define the relationship between magnetic moment $\vec{μ}$ and angular momentum $\vec{L}$ as the gyromagnetic ratio. For orbital angular momentum, this relationship is:

{\vec{μ}}_{L} = \frac{e}{2 m} \vec{L}

If we attempt to use the same logic to define the "spin magnetic moment," we need to relate the spin operator $\vec{S}$ to the magnetic moment. Recalling the definition of the spin operator in Section 5:

\vec{S} = \frac{ℏ}{2} \vec{σ} ⟹ \vec{σ} = \frac{2}{ℏ} \vec{S}

Substituting $\vec{σ}$ back into the Zeeman term $H_{Z e e m a n}$ , we get:

\begin{aligned} H_{Z e e m a n} & = - \frac{e ℏ}{2 m} (\frac{2}{ℏ} \vec{S}) \cdot \vec{B} = - 2 \cdot \frac{e}{2 m} \vec{S} \cdot \vec{B} \end{aligned}

We write the general form of magnetic potential energy as $U = - {\vec{μ}}_{S} \cdot \vec{B}$ . Comparing this with the equation above, we can read out the electron's spin magnetic moment ${\vec{μ}}_{S}$ :

{\vec{μ}}_{S} = 2 \cdot \frac{e}{2 m} \vec{S}

If we write this in the general Landé g-factor form $\vec{μ} = g \frac{e}{2 m} \vec{S}$ , we immediately arrive at the conclusion:

g = 2

This $g = 2$ is not a parameter fudged to fit experiments; it is a direct mathematical consequence of the spacetime symmetry of the Dirac equation. It implies: Electron spin generates a magnetic moment twice as efficiently as classical orbital motion.

With the magnetic moment $\vec{μ}$ , we can finally explain "who is doing the work." The classical Lorentz force ${\vec{F}}_{L o r e n t z} = q (\vec{v} \times \vec{B})$ indeed does no work. However, for an object with an intrinsic magnetic moment, its dynamics are governed by the potential energy $U$ . According to Hamiltonian mechanics, force is the negative gradient of potential energy:

\vec{F} = - \nabla U = - \nabla (- \vec{μ} \cdot \vec{B}) = \nabla (\vec{μ} \cdot \vec{B})

Since spin $\vec{μ}$ is an intrinsic property and remains constant under spatial differentiation, we obtain:

{\vec{F}}_{G r a d i e n t} = (\vec{μ} \cdot \nabla) \vec{B}

This is the force that does the work, known as the Gradient Force. If the magnetic field is uniform ( $\nabla \vec{B} = 0$ ), the force is zero, and there is only torque. However, the magnetic field produced by a real magnet is non-uniform (the magnetic field lines diverge), so $\nabla \vec{B} \neq 0$ . This is a conservative force that converts the potential energy of the magnetic field-moment coupling into the object's kinetic energy. Therefore, a magnet attracting iron is essentially the quantized spin magnetic moment being acted upon by the gradient force in a non-uniform magnetic field. The Lorentz force is responsible for deflection, while the gradient force is responsible for doing work.

So, there really are two types of magnetic forces. From the perspective of symmetry, they correspond to two completely different symmetry mechanisms, touching upon the edge of Quantum Field Theory. In QFT, interactions are prescribed by symmetry:

The Lorentz Force stems from Local Gauge Symmetry. The electromagnetic interaction exists to maintain the $U (1)$ local phase invariance of the wavefunction: $ψ \to e^{i α (x)} ψ$ . To compensate for the variation of phase with position $\partial_{μ} α (x)$ , we must introduce the gauge field $A_{μ}$ and the covariant derivative $D_{μ} = \partial_{μ} - i e A_{μ}$ . The resulting equation of motion (the Lorentz force) essentially describes how the gauge field couples to the Current. This geometric constraint dictates that the force must be perpendicular to the four-velocity; projected into 3D space, this is $\vec{v} \times \vec{B}$ . Its "no work" property is a direct manifestation of the geometric structure of gauge symmetry.
The Gradient Force stems from Broken Spatial Translational Symmetry. According to Noether's Theorem, force and work are closely related to spacetime symmetry: Momentum Conservation $⟷$ Spatial Translational Symmetry. When an electron is in a uniform magnetic field, the Hamiltonian $H = - \vec{μ} \cdot \vec{B}$ does not explicitly contain position $\vec{r}$ . The system possesses spatial translational symmetry, momentum is conserved ( $\dot{\vec{p}} = - \partial H / \partial \vec{r} = 0$ ), and there is no net force. However, when an electron is in a non-uniform magnetic field (near a magnet), the field $\vec{B} (\vec{r})$ depends on position: $\frac{\partial H}{\partial \vec{r}} = - \vec{μ} \cdot \frac{\partial \vec{B}}{\partial \vec{r}} \neq 0$ . The existence of the magnet breaks the spatial translational symmetry (Homogeneity is broken). It is precisely this breaking of spacetime background symmetry that forces the electron to change its momentum in response to the energy gradient. Thus, the Lorentz force is the price paid to maintain internal gauge symmetry, while the work-doing gradient force is the product of external spacetime symmetry breaking.

Since we are now adopting the perspective of Quantum Field Theory, we must point out that while Dirac's $g = 2$ is glorious, it is not the ultimate truth. In the Dirac equation, we treat the electron as a classical field coupling to the electromagnetic field. But in full Quantum Electrodynamics (QED), the vacuum is not empty. As an electron propagates, it constantly emits and absorbs Virtual Photons, and even creates electron-positron pairs. This means the interaction Vertex between the electron and the magnetic field is no longer just a simple point (Tree level), but contains infinite Loop corrections. Julian Schwinger calculated the first-order correction (the one-loop diagram shown above) in 1948, giving the famous formula:

g = 2 (1 + \frac{α}{2 π} + O (α^{2}))

where $α \approx 1 / 137$ is the fine-structure constant. This shifts the theoretical value of $g$ to $g \approx 2.002319 . . .$ , which is astonishingly consistent with experimental measurements (accurate to 12 decimal places). This tiny deviation (the Anomalous magnetic moment) not only confirms the relativistic origin of $g = 2$ but also reveals the deep physics behind magnetism: when we feel the pull of a magnet, we are not only witnessing the geometric attributes of spacetime (spin) but also touching the seething ocean of virtual particles in the vacuum.

8. Heisenberg Model

We now know that every electron is a tiny magnetic needle ( $g = 2$ ). However, if you put a bunch of these tiny magnets together, thermal agitation at room temperature is sufficient to completely randomize their orientations, resulting in zero macroscopic magnetic moment (paramagnetism). To form ferromagnetism, there must be an extremely strong "coupling force" between spins that forces them to align. The classical magnetic dipole-dipole interaction is far too weak—only about one ten-thousandth of the thermal energy. The real power comes from the combination of the identical particle statistics (Spin-Statistics) we mentioned earlier and the Coulomb interaction. This is known as the Exchange Interaction.

To demonstrate the essence of this, let us consider the simplest model: a two-electron system (such as a Helium atom or electrons on two adjacent Iron atoms). Assume there are two electrons, 1 and 2, and two spatial orbitals $ψ_{a} (\vec{r})$ and $ψ_{b} (\vec{r})$ , where $ψ_{a} (\vec{r})$ is localized near atom A and $ψ_{b} (\vec{r})$ is localized near atom B. We assume these two orbitals are orthonormal: $⟨ ψ_{a} | ψ_{b} ⟩ = 0$ . Note that this orthogonality is a prerequisite assumption for the Heisenberg model we are about to derive. Even if they are not orthonormal, we can create two new orthonormal orbitals through a basis transformation. Assuming their overlap integral is small greatly simplifies the calculation of the two-electron system, but this does not mean there is no interaction between the two electrons, as the interaction involves exchange integrals which generally are not zero.

The total Hamiltonian of this system is $H = H_{0} + H_{i n t}$ , where $H_{0}$ is the single-electron part (kinetic energy + nuclear potential energy), and $H_{i n t}$ is the Coulomb interaction between the two electrons: $H_{i n t} = \frac{e^{2}}{| {\vec{r}}_{1} - {\vec{r}}_{2} |}$ . Note: there is absolutely no magnetic interaction term here, only pure electrostatic repulsion.

According to the Fermi statistics hypothesis for identical particles in quantum mechanics, the total wavefunction of the electrons $Ψ (1, 2)$ must change sign (be antisymmetric) under the action of the particle exchange operator $P_{12}$ :

P_{12} Ψ (1, 2) = - Ψ (1, 2)

Since the total wavefunction consists of a spatial part $ϕ ({\vec{r}}_{1}, {\vec{r}}_{2})$ and a spin part $χ (s_{1}, s_{2})$ : $Ψ = ϕ \otimes χ$ . We first derive the spin part, and then obtain the spatial part based on the overall antisymmetry. We have two spin-1/2 particles (e.g., two electrons), so there are a total of $2 \times 2 = 4$ possible product states (uncoupled basis). We want to add their spins together to see what the total spin ${\vec{S}}_{t o t} = {\vec{S}}_{1} + {\vec{S}}_{2}$ looks like, finding the common eigenstates $| S, M ⟩$ of the total spin operator ${\hat{S}}^{2}$ and total magnetic quantum number ${\hat{S}}_{z}$ . According to angular momentum addition rules, the total spin $S$ formed by two spins of $1 / 2$ can be: $S = 1 / 2 + 1 / 2 = 1$ (Triplet, with 3 $M$ values: $+ 1, 0, - 1$ ) and $S = 1 / 2 - 1 / 2 = 0$ (Singlet, with 1 $M$ value: $0$ ).

The Triplet corresponds to three components. The total magnetic quantum number $M$ is the sum of the magnetic quantum numbers of the two particles: $M = m_{1} + m_{2}$ . To get $M = 1$ , the only possibility is both electrons are spin-up: $1 / 2 + 1 / 2 = 1$ . So, the first member of the Triplet is determined: $| 1, 1 ⟩ = | ↑↑ ⟩$ . Then, using the lowering operator ( $S_{-}$ ), we get the intermediate state ( $M = 0$ ) from $| 1, 1 ⟩$ to derive $| 1, 0 ⟩$ . Using the total lowering operator ${\hat{S}}_{-} = {\hat{S}}_{1 -} + {\hat{S}}_{2 -}$ acting on the state $| j, m ⟩$ : $J_{-} | j, m ⟩ = ℏ \sqrt{j (j + 1) - m (m - 1)} | j, m - 1 ⟩$ , acting on the coupled state on the left gives:

{\hat{S}}_{-} | 1, 1 ⟩ = \sqrt{1 (1 + 1) - 1 (1 - 1)} | 1, 0 ⟩ = \sqrt{2} | 1, 0 ⟩

Acting on the product state on the right gives:

\begin{aligned} ({\hat{S}}_{1 -} + {\hat{S}}_{2 -}) | ↑↑ ⟩ & = ({\hat{S}}_{1 -} | ↑ ⟩_{1}) | ↑ ⟩_{2} + | ↑ ⟩_{1} ({\hat{S}}_{2 -} | ↑ ⟩_{2}) = | ↓ ⟩_{1} | ↑ ⟩_{2} + | ↑ ⟩_{1} | ↓ ⟩_{2} = | ↓↑ ⟩ + | ↑↓ ⟩ \end{aligned}

Thus we obtain:

| 1, 0 ⟩ = \frac{1}{\sqrt{2}} (| ↑↓ ⟩ + | ↓↑ ⟩)

The lowest weight state ( $M = - 1$ ) is also simple; only both down can give $- 1$ : $| 1, - 1 ⟩ = | ↓↓ ⟩$ . So the spin part of the Triplet is symmetric (does not change sign upon exchange), which means the spatial part must be antisymmetric.

The quantum numbers for the Singlet are $S = 0, M = 0$ . It must be some linear combination of the product states $| ↑↓ ⟩$ and $| ↓↑ ⟩$ with $M = 0$ : $| 0, 0 ⟩ = a | ↑↓ ⟩ + b | ↓↑ ⟩$ . Since eigenstates with different quantum numbers must be orthogonal, the Singlet $| 0, 0 ⟩$ must be orthogonal to $| 1, 0 ⟩$ in the Triplet. Solving this gives:

| 0, 0 ⟩ = \frac{1}{\sqrt{2}} (| ↑↓ ⟩ - | ↓↑ ⟩)

The spin part of the Singlet is antisymmetric, so the corresponding spatial part must be symmetric. Summarizing:

Spatial Symmetric $\otimes$ Spin Antisymmetric (Singlet) Energy: $E_{S}$
- Spin Part (Antisymmetric): $S = 0$ Singlet $χ_{S} = \frac{1}{\sqrt{2}} (↑↓ - ↓↑)$ .
- Spatial Part (Symmetric): $ϕ_{S} ({\vec{r}}_{1}, {\vec{r}}_{2}) = \frac{1}{\sqrt{2}} [ψ_{a} ({\vec{r}}_{1}) ψ_{b} ({\vec{r}}_{2}) + ψ_{a} ({\vec{r}}_{2}) ψ_{b} ({\vec{r}}_{1})]$ .
Spatial Antisymmetric $\otimes$ Spin Symmetric (Triplet) Energy: $E_{T}$
- Spin Part (Symmetric): $S = 1$ Triplet $χ_{T} = {↑↑, \frac{1}{\sqrt{2}} (↑↓ + ↓↑), ↓↓}$ .
- Spatial Part (Antisymmetric): $ϕ_{A} ({\vec{r}}_{1}, {\vec{r}}_{2}) = \frac{1}{\sqrt{2}} [ψ_{a} ({\vec{r}}_{1}) ψ_{b} ({\vec{r}}_{2}) - ψ_{a} ({\vec{r}}_{2}) ψ_{b} ({\vec{r}}_{1})]$ .

Our goal is to find a mathematical expression ${\hat{H}}_{e f f}$ containing only spin operators ${\vec{S}}_{i}$ and ${\vec{S}}_{j}$ such that when it acts on the Singlet and Triplet states, it automatically yields the corresponding energies $E_{S}$ and $E_{T}$ . To construct this Hamiltonian, the most natural building block is the dot product of the two spins ${\vec{S}}_{i} \cdot {\vec{S}}_{j}$ . We need to calculate the eigenvalues of this operator for the Singlet and Triplet states. Define the total spin operator for the two-electron system: ${\vec{S}}_{t o t} = {\vec{S}}_{i} + {\vec{S}}_{j}$ . Squaring the total spin operator allows us to solve for the dot product term: ${\vec{S}}_{i} \cdot {\vec{S}}_{j} = \frac{1}{2} ({\vec{S}}_{t o t}^{2} - {\vec{S}}_{i}^{2} - {\vec{S}}_{j}^{2})$ . Using the eigenvalue formula for the square of the angular momentum operator in quantum mechanics ${\hat{S}}^{2} | s ⟩ = s (s + 1) | s ⟩$ (omitting $ℏ^{2}$ for brevity, or treating spin as dimensionless):

For a single electron ( $s = 1 / 2$ ): ${\vec{S}}_{i}^{2} = {\vec{S}}_{j}^{2} = \frac{1}{2} (\frac{1}{2} + 1) = \frac{3}{4}$
For the Singlet ( $S_{t o t} = 0$ ): ${\vec{S}}_{t o t}^{2} | S ⟩ = 0 (0 + 1) | S ⟩ = 0$
For the Triplet ( $S_{t o t} = 1$ ): ${\vec{S}}_{t o t}^{2} | T ⟩ = 1 (1 + 1) | T ⟩ = 2 | T ⟩$ . Now, substituting these values back into the dot product formula to calculate the eigenvalues:
Dot product value for Singlet: $({\vec{S}}_{i} \cdot {\vec{S}}_{j}) | S ⟩ = \frac{1}{2} (0 - \frac{3}{4} - \frac{3}{4}) | S ⟩ = - \frac{3}{4} | S ⟩$
Dot product value for Triplet: $({\vec{S}}_{i} \cdot {\vec{S}}_{j}) | T ⟩ = \frac{1}{2} (2 - \frac{3}{4} - \frac{3}{4}) | T ⟩ = \frac{1}{2} (\frac{1}{2}) | T ⟩ = \frac{1}{4} | T ⟩$

We assume the effective Hamiltonian ${\hat{H}}_{i j}$ has the following linear form (this is the most general rotationally symmetric form): ${\hat{H}}_{i j} = C_{0} + C_{1} ({\vec{S}}_{i} \cdot {\vec{S}}_{j})$ , where $C_{0}$ is just a constant energy shift independent of spin configuration, which can be discarded (or the energy zero point redefined) when studying phase transitions and spin dynamics. Substituting the data for Singlet and Triplet, we have $E_{S} = C_{1} (- \frac{3}{4}), E_{T} = C_{1} (\frac{1}{4})$ . Defining the constant $J \equiv E_{S} - E_{T}$ , we obtain the two-particle Hamiltonian:

{\hat{H}}_{i j} = - J ({\vec{S}}_{i} \cdot {\vec{S}}_{j})

If $J > 0$ ( $E_{S} > E_{T}$ ): The coefficient is negative. The larger the dot product (parallel, +1/4), the lower the energy. This is Ferromagnetism.
If $J < 0$ ( $E_{S} < E_{T}$ ): The coefficient is positive. The smaller the dot product (anti-parallel, -3/4), the lower the energy. This is Antiferromagnetism.

Now we generalize this two-particle interaction to the entire lattice. Assuming each electron $i$ only interacts with its nearest neighbors. We need to sum over all atoms in the lattice. To correct for double counting:

H_{e x c h a n g e} = - \frac{J}{2} \sum_{i, j neighbor} {\vec{S}}_{i} \cdot {\vec{S}}_{j}

Finally, we must consider the interaction of each electron spin with an external uniform magnetic field $\vec{B}$ . This is a single-body interaction and does not involve neighbors. Recalling our conclusion from the Dirac equation, the electron has a spin magnetic moment:

{\vec{μ}}_{S} = - g \frac{e}{2 m} \vec{S}

To make the formula more concise and universal, physicists define a natural combination of constants called the Bohr Magneton. The Bohr Magneton is the natural unit of magnetic moment in atomic physics. It is defined as:

μ_{B} \equiv \frac{e ℏ}{2 m_{e}}

This physical quantity contains three fundamental constants: elementary charge $e$ , Planck constant $ℏ$ , and electron mass $m_{e}$ . It represents the magnitude of the orbital magnetic moment generated by a classical electron moving in the ground state orbit of a hydrogen atom. This is in SI units; in Gaussian units, it also includes the speed of light: $μ_{B} \equiv \frac{e ℏ}{2 m_{e} c}$ . If we treat the spin operator $\vec{S}$ as a dimensionless operator (i.e., eigenvalues are $1 / 2$ instead of $ℏ / 2$ ), then the true physical angular momentum is $ℏ \vec{S}$ . Extracting this $ℏ$ and combining it with the constants above:

\begin{aligned} {\vec{μ}}_{S} & = - g \frac{e}{2 m_{e}} (ℏ {\vec{S}}_{dimensionless}) = - g (\frac{e ℏ}{2 m_{e}}) \vec{S} = - g μ_{B} \vec{S} \end{aligned}

When an electron is in an external magnetic field $\vec{B}$ , its potential energy (Zeeman Energy) is given by the classical electromagnetism formula $U = - \vec{μ} \cdot \vec{B}$ . Substituting the magnetic moment expression:

\begin{aligned} U_{Z e e m a n} & = - (- {\vec{μ}}_{S}) \cdot \vec{B} (Note the sign: Potential energy is usually defined as - \vec{μ} \cdot \vec{B}) \\ = - (- g μ_{B} \vec{S}) \cdot \vec{B} \\ = g μ_{B} \vec{S} \cdot \vec{B} \end{aligned}

Physical Convention on Signs: In condensed matter physics, we usually want the Hamiltonian to reflect energy minima. Electrons are negatively charged, so the magnetic moment $\vec{μ}$ is anti-parallel to the spin $\vec{S}$ . The lowest energy state is when the magnetic moment $\vec{μ}$ is parallel to the magnetic field $\vec{B}$ . This means the spin $\vec{S}$ is anti-parallel to the magnetic field $\vec{B}$ . To avoid dealing with cumbersome negative signs, or to make spin look like it aligns "with" the field (defining $\vec{S}$ to point in the direction of the magnetic moment rather than angular momentum), literature sometimes adjusts the definition. However, the standard derivation (keeping the electron's negative charge) gives the Zeeman term as $+ g μ_{B} \vec{S} \cdot \vec{B}$ or $- \vec{μ} \cdot \vec{B}$ . Nevertheless, in the customary notation of the Heisenberg model, for mathematical symmetry and ease of discussion (e.g., assuming $g$ is negative or redefining the spin direction), the Zeeman term is typically written with a negative sign, indicating that spins tend to align along the field (this is a phenomenological treatment):

H_{Z e e m a n} = - g μ_{B} \sum_{i} {\vec{S}}_{i} \cdot \vec{B}

(This implies that energy is minimized when ${\vec{S}}_{i}$ is in the same direction as $\vec{B}$ . This implies we have redefined the spin direction, or taken $g$ to be negative. In phenomenological models, we only care about: which direction does the magnetic field tend to pull the spins.)

Now, we combine the two pieces of the puzzle: Internal Interaction (Exchange Energy generated by the Pauli principle and Coulomb force) and External Interaction (Coupling of magnetic moment with external field generated by relativistic quantum effects). Adding them together, we finally obtain the core Hamiltonian describing solid-state magnetism—the Heisenberg Model:

H = - \frac{J}{2} \sum_{⟨ i, j ⟩} {\vec{S}}_{i} \cdot {\vec{S}}_{j} - g μ_{B} \sum_{i} {\vec{S}}_{i} \cdot \vec{B}

This formula is the cornerstone of modern magnetism. The first term ( $J$ ) explains why magnets have magnetism (spontaneous magnetization, ordered alignment of spins). The second term ( $B$ ) explains how magnets are controlled by the outside world (magnetization process, hysteresis loop). $μ_{B}$ and $g$ link microscopic quantum constants ( $ℏ, e, m_{e}$ ) with macroscopic observable magnetic fields.

9. Ising Model

We have completed the construction of the microscopic mechanism (Dirac $\to$ Spin $\to$ Exchange Interaction $\to$ Heisenberg Hamiltonian). Now, we must move from the microscopic to the macroscopic. To do this, we need to handle the Heisenberg Model. However, solving the Heisenberg Model exactly in two or three dimensions is extremely difficult (because it contains non-commuting operators). Therefore, we need to introduce the Ising Model as an approximation and use Mean-Field Theory to demonstrate how symmetry is broken.

We first non-dimensionalize the Heisenberg model and analyze its structure. Assume the external magnetic field is along the $z$ -direction: $\vec{B} = (0, 0, B)$ . We expand the dot product of the spin operators $\vec{S}$ into longitudinal ( $z$ ) and transverse ( $x, y$ ) components:

{\vec{S}}_{i} \cdot {\vec{S}}_{j} = S_{i}^{z} S_{j}^{z} + (S_{i}^{x} S_{j}^{x} + S_{i}^{y} S_{j}^{y})

To see the physical meaning of the transverse part more clearly, we introduce Ladder Operators:

S_{i}^{+} = S_{i}^{x} + i S_{i}^{y}, S_{i}^{-} = S_{i}^{x} - i S_{i}^{y}

Thus, the Heisenberg model can be rewritten in two parts:

H = \underset{Ising Part}{\underset{⏟}{[- \frac{J}{2} \sum_{⟨ i, j ⟩} S_{i}^{z} S_{j}^{z} - h \sum_{i} S_{i}^{z}]}} + \underset{Flip Part}{\underset{⏟}{[- \frac{J}{4} \sum_{⟨ i, j ⟩} (S_{i}^{+} S_{j}^{-} + S_{i}^{-} S_{j}^{+})]}}

(where $h = g μ_{B} B$ ). These two parts have distinct physical meanings:

Ising Part (Longitudinal Term): This term involves only $S^{z}$ . Since $S_{i}^{z}$ on different lattice sites commute ( $[S_{i}^{z}, S_{j}^{z}] = 0$ ), they behave like classical scalar variables. This describes the static alignment of spins along the $z$ -axis.
Flip Part (Transverse/Flip Term): This term involves $S_{i}^{+} S_{j}^{-}$ . Its action is to flip the spin at site $j$ down ( $S^{-}$ ) while simultaneously flipping the spin at site $i$ up ( $S^{+}$ ): $S_{i}^{+} S_{j}^{-} | ↓_{i} ↑_{j} ⟩ = | ↑_{i} ↓_{j} ⟩$ . Physically, this represents the movement of spin excitations. Just like a spin-flipped state hopping through the lattice, this corresponds to Spin Waves or Magnons. This is a form of quantum fluctuation that imparts "kinetic energy" to the system, tending to destroy ordered alignment.

In many real magnetic materials, due to the symmetry of the crystal structure, there exists Magnetic Anisotropy. This means the energy of spins along certain directions (e.g., the $z$ -axis, the easy axis) is lower than in the $x, y$ plane. If the anisotropy is strong enough, or if we are only concerned with phase transition behavior in the classical limit, we can ignore the Flip Part (quantum fluctuations) and retain only the longitudinal term. This is the famous Ising Model.

At this point, we replace the operator $S_{i}^{z}$ with a classical variable $σ_{i} = \pm 1$ (absorbing the coefficients into $J$ ):

H_{I s i n g} = - \frac{J}{2} \sum_{⟨ i, j ⟩} σ_{i} σ_{j} - h \sum_{i} σ_{i}

This is a massive simplification: we turn a non-commuting quantum matrix problem into a classical statistical combinatorial problem. The 2D Heisenberg model cannot be solved exactly to this day, though we can solve it using computational methods. However, the Ising model within it is relatively simpler. The solution to the 1D Ising model is very simple; there is no phase transition in 1D (Ising, 1924). But the 2D case is also very difficult to solve. It wasn't until the brilliant Lars Onsager published a very simple paper stating that he had solved it—without providing the solution details, but giving the critical temperature and pointing out that it is ferromagnetic at low temperatures and paramagnetic at high temperatures—that we had the first exactly solvable model exhibiting a ferromagnetic phase transition. He withheld the solution process until C.N. Yang saw the paper and provided a solution, which was still famously difficult. To understand the physical picture intuitively, we will adopt the Mean-Field Approximation here.

The problem we face is many-body coupling: the state of $σ_{i}$ depends on $σ_{j}$ , and $σ_{j}$ depends on $σ_{k}$ ... This chain reaction makes the partition function difficult to calculate. The Mean-Field idea is: when we focus on atom $i$ , we don't care whether neighbor $j$ is flipping between $+ 1$ or $- 1$ ; we only care about the average influence of the neighbors. We write $σ_{j}$ as an average value plus a fluctuation: $σ_{j} = ⟨ σ ⟩ + δ σ_{j}$ . Ignoring the second-order fluctuation term $δ σ_{i} δ σ_{j} \approx 0$ , the Hamiltonian can be linearized into a single-body form:

H_{M F A} = - \sum_{i} σ_{i} \underset{h_{e f f}}{\underset{⏟}{(J \sum_{j \in neigh} ⟨ σ ⟩ + h)}}

This defines the Effective Molecular Field $h_{e f f}$ :

h_{e f f} = J z ⟨ σ ⟩ + h

where $z$ is the coordination number (number of neighbors for each atom). Now, the problem becomes the statistical distribution of a single spin in an "external field" $h_{e f f}$ . According to the Boltzmann distribution, the probability of this spin being up is proportional to $e^{β h_{e f f}}$ , and the probability of being down is proportional to $e^{- β h_{e f f}}$ (where $β = 1 / k_{B} T$ ). Thus, the thermodynamic average of this spin $⟨ σ_{i} ⟩$ is:

⟨ σ_{i} ⟩ = \frac{(+ 1) e^{β h_{e f f}} + (- 1) e^{- β h_{e f f}}}{e^{β h_{e f f}} + e^{- β h_{e f f}}} = \tanh (β h_{e f f})

Since the lattice is uniform, $⟨ σ_{i} ⟩$ must equal the average value of the field source itself, $m = ⟨ σ ⟩$ . Substituting $h_{e f f}$ , we obtain the famous Self-consistent Equation:

m = \tanh (\frac{J z m + h}{k_{B} T})

Let us consider the most critical case: no external magnetic field (h=0). The equation simplifies to:

m = \tanh (\frac{T_{c}}{T} m)

where we package the constants into the definition of the Curie Temperature Tc=Jz/kB. This is a transcendental equation, and we can analyze the behavior of the solution graphically (by finding the intersection of y=m and y=tanh(TTcm)):

High-Temperature Phase (T>Tc): The slope of the tanh curve at the origin is Tc/T<1. The line and the curve intersect at only one point, m=0. Physical meaning: Thermal agitation is intense; without an external field, there is no magnetism. This is the Paramagnetic Phase.
Low-Temperature Phase (T<Tc): The slope of the tanh curve at the origin is Tc/T>1. The origin m=0 becomes an unstable solution, and two new stable non-zero solutions m=±m0 appear. Physical meaning: Even if h=0, the system will spontaneously generate a non-zero magnetization m0. This is the Ferromagnetic Phase.

This is Spontaneous Symmetry Breaking: The Hamiltonian has a flip symmetry σ→−σ when h=0. However, when the temperature drops below Tc, nature is forced to choose between "all up" and "all down." This choice is not imposed by an external force but is a collective decision made spontaneously by the system to lower its energy (driven by the exchange interaction J). This is precisely the statistical mechanical essence behind the macroscopic phenomenon of a magnet attracting iron.

10. Magnetic Domains

Based on the Ising model and Mean-Field Theory, we concluded that when the temperature is below the Curie temperature $T_{c}$ , electron spins spontaneously align, producing a massive macroscopic magnetization $M$ . However, this immediately leads to a new paradox: if you go to a hardware store and buy an iron nail (room temperature is obviously far below Iron's Curie temperature of $1043 K$ ), it is not magnetic. It does not pick up other objects. This is because we ignored one final energy competition.

Our previous Hamiltonian only considered Exchange Energy and Zeeman Energy. However, at the macroscopic scale, there is also the classical Magnetostatic Energy. If all $10^{23}$ atoms in a piece of iron were aligned upwards, this magnet would establish a huge magnetic field in the surrounding space. The magnetic field contains energy density $B^{2} / 2 μ_{0}$ . Spreading out all those magnetic field lines costs a tremendous amount of energy. To reduce this Magnetostatic Energy, the material spontaneously splits into many tiny regions called Magnetic Domains. Although inside a domain, the exchange interaction keeps spins aligned (satisfying microscopic ferromagnetism), overall, the vector sum of the magnetic moments of the various domains is zero ( $\sum {\vec{M}}_{i} = 0$ ). There are no external magnetic field lines, thereby drastically reducing the magnetostatic energy.

The boundary between magnetic domains is called a Domain Wall. Inside the domain wall, spins do not flip abruptly but rotate gradually. This is another game of energy trade-offs: Exchange Energy wants spins to be parallel and resists their turning (preferring the wall to be as wide as possible); Magnetic Anisotropy wants spins to align along the easy axis and resists them pointing in intermediate directions (preferring the wall to be as narrow as possible). The balance between the two determines the thickness of the domain wall (typically hundreds of atomic layers). The formation of magnetic domains is not derived from "first principles" like spin, but belongs to the realm of Micromagnetics, involving energy minimization in continuum field theory, which we will not expand upon here.

Now, we can finally fully describe the macroscopic process of "a magnet attracting iron":

Initial State: The interior of the iron nail is filled with chaotic magnetic domains; the macroscopic magnetic moment is zero.
External Field Intervention: When you bring a magnet close to the nail, you provide an external magnetic field ${\vec{B}}_{e x t}$ .
Domain Wall Motion: The equilibrium is broken. Domains whose direction aligns with ${\vec{B}}_{e x t}$ have lower Zeeman energy ( $E = - \vec{M} \cdot \vec{B}$ ). Consequently, these "compliant" domains begin to swallow the surrounding "non-compliant" domains. The domain walls move.
Macroscopic Magnetization: The nail rapidly gains a huge net magnetic moment.
Gradient Force Work: This induced macroscopic magnetic moment ${\vec{m}}_{t o t a l}$ is pulled by the gradient force $\vec{F} = \nabla ({\vec{m}}_{t o t a l} \cdot \vec{B})$ generated by the magnet's non-uniform field.
Click: The nail flies towards the magnet.

Conclusion: The Deep Symmetry of the Universe

When you play with two magnets in your hands, feeling the repulsion and attraction between them, you are feeling more than just a force. You are touching the essence of quantum mechanics and the secrets of cosmic evolution with your own hands. Let us review this journey and see how we rebuilt our physical intuition:

The Classical Collapse: We found that the Lorentz force does no work, and classical statistical physics forbids magnetism (Bohr-van Leeuwen Theorem).
The Relativistic Correction: The Dirac equation revealed that the electron must be a 4-component spinor carrying an intrinsic magnetic moment with g=2. Magnetism is the residue of relativistic effects in the low-speed world.
The Power of Quantum Statistics: The Pauli Exclusion Principle combined with Coulomb repulsion creates an equivalent "Exchange Interaction," forcing spins to align parallel.
Symmetry Breaking: The Ising model taught us that when the temperature drops, the system, in order to survive (lower its energy), is forced to break rotational symmetry and choose a direction.

Finally, it is worth mentioning that the Spontaneous Symmetry Breaking (SSB) we saw in the Ising model has significance far beyond solid-state physics. It is a core paradigm for understanding the universe in modern physics. At the beginning of the Big Bang (extremely high temperature), physical laws possessed extremely high symmetry. All fundamental particles were massless, just like the iron block at high temperatures has no magnetism (paramagnetic phase). As the universe cooled, when the temperature dropped below a certain critical value, the Higgs Field filling the universe underwent a phase transition. Just like electron spins suddenly choosing to point in one direction, the Higgs field acquired a non-zero Vacuum Expectation Value in empty space.

In a ferromagnet, symmetry breaking endows the material with magnetism.
In the Standard Model, symmetry breaking endows fundamental particles with mass.

So, the next time you see a magnet pick up a paperclip, realize this: you are witnessing a miniature moment of cosmic creation. The mechanism that gives the nail its magnetism is the very same mechanism that gives the quarks and electrons in your body their mass, allowing this universe to exist.

Magnetic force does no work; it is the geometry of spacetime doing the work. The attraction of a magnet is the dance of a quantum ghost in the macroscopic world.