A Mathematical Prelude: Some constants and a bit of linear algebra

And now for the first contentful post about quantum mechanics. It’s wild, wonderful, exciting… okay, not really. It’s the mathematical prelude, an overview of some key physical constants and some key bits of linear algebra which we’ll use rather extensively going forwards. This isn’t meant to be deep, or to be an introduction to linear algebra; it’s a way to establish the notations we’ll be using, and to point out the facts which will happen to be important to us later on.

Some useful numbers

The fundamental physical constant associated with Quantum Mechanics is Planck’s constant, h\approx 6.626\cdot 10^{-34} J\cdot sec. Planck also defined a “reduced” constant, \hbar = h/2\pi,1 and it quickly became clear that this constant was the one which actually showed up in equations; so nowadays when anyone refers to Planck’s constant they’re referring to this. (On the rare occasions when the original number is needed, one simply uses 2\pi\hbar instead)

Our unit of length will be the Angstrom, 1\AA = 10^{-10}m, and our unit of energy will be the electron-volt, the energy that an electron acquires when moving through a potential difference of one volt, 1{\rm eV} \approx 1.602\cdot 10^{-19} J. These are convenient for what we will need — a typical atom is 1-3Å across, binding energies in atoms are on the order of a few eV, and so on. A few combinations which are particularly useful to remember are:

\hbar c \approx 1973 eV \cdot \AA

m_{electron} c^2 \approx 511 keV

m_{proton,\ neutron} c^2 \approx 0.9 GeV

Atomic binding energies are generally on the order of a few eV; the weak hydrogen-hydrogen bonds used in biological systems are a few meV; nuclear binding energies are a few MeV. The largest particle accelerators have beam energies on the order of tens of TeV per particle; the most energetic cosmic ray ever observed was a proton with an energy of about 3 \cdot 10^{20} eV, about the same as a baseball going at 60mph.

A bit of linear algebra

Bibliographic note: My favorite more detailed text on this subject is chapter 1 of Sakurai. If you’re of a somewhat mathematical bent, that book is a great place to start.

And now for a lightning review of linear algebra. This is not meant to teach you linear algebra; if you tried to learn it from this, your head would probably explode. It’s just a way for me to review the key points, describe the rather oddball notation that we use in quantum mechanics, and point out a few key facts which we’ll use over and over later on. The hope is that you’ll read over this and say “yeah, yeah, obvious… incomprehensible, but obvious… blah blah… ok.” If you don’t say that, then stop me and ask, because this part needs to be clear. (Also, if you need a linear algebra refresher, the  Wikipedia articles are an OK place to start)

Vector spaces, kets: A vector space is any collection of objects which can be added and multiplied by numbers. (aka, scalars) In quantum mechanics, we will generally be concerned with complex vector spaces, i.e. ones where these constants are complex numbers. Some common examples of vector spaces are the space of vectors in the plane, and the space of functions on the real line. We will denote the vector named “x” by \left|x\right>. (This is the notation standard in quantum mechanics; it’s different than the one commonly used in mathematics, but it turns out to be surprisingly useful) Such vectors are commonly called “kets,” (Dirac’s name) and correspond to the column vectors familiar from high school algebra. We will label our kets with an amazing bestiary of things, including Roman letters, Greek letters, and numbers. We’ll use lower-case letters from all alphabets for our scalars.

Inner products, bras, adjoints: All of the vector spaces we will be interested in are inner product spaces — i.e., there is an inner (dot) product which takes two vectors and gives a scalar. But we’re going to be a bit more careful than this standard statement. Instead, we’ll say this: For every vector space, there exists a dual vector space; this is the pairing of column vectors to row vectors. These row vectors are denoted by \left<x\right|, and are called “bras.” The standard pairing between rows and columns is called the Hermitian adjoint, the adjoint or (in speech) simply the “dagger:” \left<x\right| \equiv \left|x\right>^\dagger. This dagger is “anti-linear:”

(c\left|x\right> + d\left|y\right>)^\dagger = c^\star \left<x\right| + d^\star\left<y\right|.

In simpler terms, the dagger is the complex conjugate of the transpose. For complex vector spaces, this dagger will take the place of the ordinary transpose everywhere. We define the inner (dot) product to be a product of a bra and a ket: \left<x\right|\left.y\right>, which is a number. Note that you can immediately tell if something is a bra, a ket, or a scalar, simply by counting the pointy bits; this is often useful.2

For vectors on a plane, if \left|x\right> = a_1\left|i\right> + a_2\left|j\right> and \left|y\right> = b_1\left|i\right> + b_2\left|j\right> (where \left|i\right> and \left|j\right> are the unit vectors along the axes), then \left<x\right|\left.y\right> = a_1^\star b_1 + a_2^\star b_2. For functions on the real line, the standard inner product is

\left<f\right|\left.g\right> = \int_{\pm\infty} f^\star(x) g(x) {\rm d} x.

Note that this doesn’t actually work as an inner product over arbitrary functions, since the integral could diverge; so the interesting inner product space is the space of square-integrable functions, i.e. the set of functions for which this integral is finite. We’ll use this space all the time.

The norm of a vector is the square root of its dot product with itself. The formal definition of an inner product also states that every vector must have nonnegative norm, and only the zero vector has norm zero. See the Wikipedia page for more about inner products if you want details.

In QM you’ll often see the term “Hilbert space” bandied about. Technically, a Hilbert space is an inner product space which is Cauchy-complete; in practice, fewer than one physicist in fifty could probably tell you that definition from memory, because it’s of very little practical use in QM. However, it does turn out that all of the vector spaces we’ll be interested really are Hilbert spaces, and so they are commonly referred to as such and are often denoted by  {\cal H}. Physicists will also sometimes abuse this notation even more and use the term “Hilbert space” only for infinite-dimensional Hilbert spaces, like spaces of functions, and “vector space” for finite-dimensional ones; there is no particularly good reason for this.

Operators: A linear operator (or just “operator” for short) on a vector space is something which maps vectors onto other vectors, which is linear. (i.e., a matrix, but for general vector spaces they’re typically called operators) We’ll denote operators by capital Roman letters, so A\left|x\right> is a vector. Operator multiplication isn’t commutative; order matters. The commutator of two operators is defined to be

[A, B] = AB - BA.

Note that if you multiply a ket and a bra in the “wrong order,” you get an operator; you can tell that from the fact that when you multiply it by a ket on the right, you get a ket back.

(\left|x\right>\left<y\right|)\left|z\right> = \left|x\right>\left(\left<y\right|\left.z\right>\right)

Operators can be acted on with daggers, too: that dagger is defined by \left(A\left|x\right>\right)^\dagger = \left<x\right| A^\dagger for every vector \left|x\right>. In practice, the adjoint of an operator is the complex conjugate of its transpose, just like with vectors. Like transposes, daggers reverse the order of multiplication: \left(ABC\right)^\dagger = C^\dagger B^\dagger A^\dagger. (As follows from the definition of the dagger and the associative law) The “identity matrix” is the operator which maps every vector onto itself, and will commonly simply be denoted as “1.”

Operators on vectors in a plane are simply 2×2 matrices. Operators on the space of functions are linear operators; e.g., derivatives are linear operators. The product of two operators is an operator, so \partial, \partial^2, etc., are all linear operators.

An operator is called “Hermitian” if A^\dagger = A, “normal” if [A, A^\dagger] = 0, and “unitary” if A^\dagger A = A A^\dagger = 1.

Hermitian 2×2 matrices have the form \left(\begin{array}{cc}a_{00} & a_{10} + i b_{10} \\ a_{1 0} - i b_{1 0} & a_{11} \end{array}\right), where all of the coefficients in the matrix are real numbers. Unitary 2×2 matrices have the form \left(\begin{array}{cc} \cos \theta & e^{i\phi}\sin\theta \\ -e^{-i\phi}\sin\theta & \cos\theta\end{array}\right); i.e., they’re simply rotation matrices.

Since we can form polynomials out of operators, we can also form functions of them (or at least, analytic functions of them) by means of Taylor series. One thing that we will do particularly often is exponentiate. A useful fact is that, if [A, B] = 0, then e^A e^B = e^{A + B}. (Exercise: Prove it) Also, \left(e^A\right)^\dagger = e^{A^\dagger}. Together, these two facts imply that if A is Hermitian, then e^{i A t} is unitary, for any real number t. In fact, the converse is also true: every unitary operator can be written in the form e^{i A} for some Hermitian A.

Bases: Probably the most important theorem of linear algebra is the Gram-Schmidt theorem, which says that every inner product space has an orthonormal basis. A basis is a set of vectors, which are linearly independent (i.e., no vector in that set can be written as a linear combination of the others) and such that every vector in the entire vector space can be written as a linear combination of those basis vectors; orthonormal means that these basis vectors are orthogonal (the inner product of any two different basis vectors is zero) and normal. (The norm of each basis vector is 1) Each vector space has in fact an enormous variety of bases; but the number of elements in the basis is a constant for any particular vector space, and that number is the dimension of the vector space.

A basis for the set of vectors in the plane is (1, 0) and (0, 1), the unit vectors in the x and y directions; this vector space is two-dimensional. By Fourier’s Theorem, the set of functions e^{ikx}, for every real number k, is a basis for the set of functions on the real line; this vector space is infinite-dimensional.

Given a basis, any vector can be written as its “basis expansion,” i.e. the set of coefficients by which you need to multiply each basis vector in order to get that vector — that’s really what we refer to when we write out a column vector as a column of numbers. Probably the most useful equation you can write about a basis is that, if the \left|x_i\right> form a basis, then

\Sigma_i \left|x_i\right>\left<x_i\right| = 1.

(This is easy to see: Write an arbitrary vector \left|\psi\right> as a linear combination of the x’s, multiply this matrix by that vector, and use the fact that \left<x_i\right|\left.x_j\right> = \delta_{ij})

Bases are by no means unique. In fact, if the set of \left|x_i\right> form an orthonormal basis for a vector space, and U is any unitary operator, then the set of U\left|x_i\right> are also an orthonormal basis. The example 2×2 unitary matrix above gives you the physical intuition: unitary matrices simply rotate the bases of a space. It is also easy to show that every two orthonormal bases of a vector space are related by this equation; simply let

U = \Sigma_i \left|y_i\right>\left<x_i\right|

Then U \left|x_j\right> = U \Sigma_i \left|y_i\right>\left<x_i\right|\left.x_j\right> = \Sigma_i \left|y_i\right> \delta_{ij} = \left|y_j\right>, so this matrix really does relate the two bases; and  U^\dagger U = \Sigma_{ij} (\left|x_i\right>\left<y_i\right|)(\left|y_j\right>\left<x_j\right|) = \Sigma_{ij} \left|x_i\right>\delta_{ij}\left<x_j\right| = \Sigma_i \left|x_i\right>\left<x_i\right| = 1.

Thus every unitary operator can be thought of as a change of basis.

If we expand vectors in terms of their basis coefficients, we can also expand operators in the same way. (Which is what we mean when we write out matrices as arrays of numbers) The matrix elements of an operator A are simply A_{ij} = \left<x_i\right|A\left|x_j\right>. Under a change of basis, A \rightarrow U A U^\dagger; that way, for every vector, A \left|x\right> \rightarrow (U A U^\dagger) (U \left|x\right>) = U (A \left|x\right>), and so A\left|x\right> is a vector as well.

Eigen(values, vectors): If A is an operator, then \left|x\right> is an eigenvector of A, with eigenvalue \lambda, if

A \left|x\right> = \lambda\left|x\right>.

These will turn out to be of tremendous importance in QM. The Spectral Theorem says that, if A is Hermitian, then not only does it have eigenvectors, but the set of its eigenvectors forms a basis of the vector space; i.e., there are as many linearly independent, orthonormal eigenvectors as there are dimensions in the space. This is typically referred to as the “eigenbasis” of A. If the eigenvalues of A are \left|x_i\right>, with respective eigenvalues \lambda_i, then in the eigenbasis A takes on the particularly simple form

A = \left(\begin{array}{ccc}\lambda_1& & \\ & \ddots & \\ & & \lambda_n \\ \end{array}\right)

If on the other hand we have the eigenvectors of A written in some other basis, then we can form a unitary matrix by simply stuffing the eigenvectors into the columns of a matrix P; that matrix is simply the transformation matrix into the eigenbasis. (So we can write A = P^\dagger D P, where D is the diagonal matrix of eigenvalues and P is unitary) Because finding the eigenvectors is equivalent to finding the basis in which A is diagonal, this operation is also referred to as “diagonalization.” The set of eigenvalues is called the spectrum of A. The trace of A is the sum of the eigenvalues; the determinant is their product. (To see this, note that det\ A B = det\ B A, and that P is unitary) A is positive (semi)definite if all of the eigenvalues are greater than (or equal to) zero; it has an inverse iff none of the eigenvalues are zero. Also, note that since A is Hermitian, the eigenvalues must also be real. (Easy exercise: Show this)3

The standard way to find the eigenvalues of a (finite-dimensional) matrix is to note that the eigenvalue equation can be rewritten as \left(A - \lambda \cdot 1\right)\left|x\right> = 0; this equation can only hold for a nonzero vector \left|x\right> if the matrix in parentheses is singular, and thus has determinant zero. We therefore take the determinant of the quantity in parentheses and demand that it vanish; for an NxN matrix, this is an N-th order polynomial equation, and the N independent solutions are the eigenvalues. Each individual eigenvalue can then be substituted in to the eigenequation to solve for the corresponding eigenvector.

Example: Let A = \left(\begin{array}{cc}0 & 1 \\ 1 & 0 \end{array}\right). Then the eigencondition is det \left(\begin{array}{cc} -\lambda & 1 \\ 1 & -\lambda \end{array}\right) = \lambda^2 - 1 = 0. This has solutions \lambda = \pm 1. The corresponding eigenvector equations say

\left(\begin{array}{cc} \mp 1 & 1 \\ 1 & \mp 1 \end{array}\right)\left(\begin{array}{c}v_1 \\ v_2\end{array}\right) = \left(\begin{array}{c} \mp v_1 + v_2 \\ v_1 \mp v_2 \end{array}\right) = 0

And thus v_1 = \pm v_2. In order to make these vectors normal, we multiply them each by 1/\sqrt{2}. (N.B., any multiple of an eigenvector is an eigenvector with the same eigenvalue!) Thus the eigenvectors are

\frac{1}{\sqrt 2}\left(\begin{array}{c}1\\1\end{array}\right),\ \lambda = 1, and

\frac{1}{\sqrt 2}\left(\begin{array}{c}1\\-1\end{array}\right),\ \lambda = -1.

The unitary matrix that moves from the basis in which we originally wrote A into the eigenbasis is simply

U = \frac{1}{\sqrt 2}\left(\begin{array}{cc} 1 & 1 \\ 1 & -1 \end{array}\right); if we apply this change of basis to A,

U A U^\dagger = \left(\begin{array}{cc}1 & 0 \\ 0 & -1 \end{array}\right).

Note that, if two eigenvectors have the same eigenvalues, then any linear combination of those two eigenvectors is also an eigenvector with that eigenvalue. (i.e., the eigenbasis isn’t unique; we can rotate any eigenvector into any other eigenvector and it’s still an eigenbasis)

When we work with an eigenbasis, we will customarily label the basis vectors by their eigenvalues — i.e., A\left|\lambda_i\right> = \lambda_i\left|\lambda_i\right>.

One of the most important facts about eigenvectors in quantum mechanics is that, if any two Hermitian operators A and B commute, then they have the same set of eigenvectors. (Although they may — will! — have different eigenvalues for each of those vectors. A and B need not even have the same units.)

Proof: Let \left|\lambda_i\right> be the eigenkets of A. First, note that \left<\lambda_i\right|A = (A^\dagger \left|\lambda_i\right>)^\dagger = (\lambda_i \left|\lambda_i\right>)^\dagger = \left<\lambda_i\right|\lambda_i. (Since A is Hermitian) Now, consider the case where A is nondegenerate, i.e. all of its eigenvalues are different. Then \left<\lambda_i\right|AB-BA\left|\lambda_j\right> = (\lambda_i - \lambda_j)\left<\lambda_i\right|B\left|\lambda_j\right> = 0, since AB = BA. Since the eigenvalues are unequal, this must mean that \left<\lambda_i\right|B\left|\lambda_j\right> = 0 whenever i \ne j, and so B is diagonal. Now imagine that there are two eigenvectors of A which are degenerate; in that case, there is a corresponding 2×2 submatrix B_{ij} which is not constrained to be diagonal. But this submatrix is Hermitian, because B is, and so it too can be diagonalized. But this diagonalization is just a replacement of those two \lambda_i‘s with linear combinations of those two; and as we already saw, whenever two eigenvalues are degenerate we can rotate them arbitrarily and get perfectly valid new eigenvectors. Therefore we can always construct an eigenbasis for A in which B is diagonal, and thus those basis vectors are also eigenvectors of B. ♦

The useful consequence of this theorem is: We will typically find ourselves with some collection of Hermitian operators “of interest” in a particular problem. We can always pick some maximal commuting subset of that set. (i.e., adding any more operators would mean that some pair of them don’t commute) The simultaneous eigenvectors of that set — which we will call a “complete set of commuting observables,” or CSCO for short, for reasons to be seen next time — will be our basis of choice in most matters, and we will label the vectors by their various eigenvalues \left|a,\ b,\ \ldots\right>. It will turn out later that these eigenvalues supply all of the information which can even theoretically be measured in a quantum system — a fact which will have profound physical implications.

This concludes the extremely rapid review of linear algebra. Wasn’t that fun?

Next time: Something completely different: Actual quantum physics!

1 Pronounced “aitch-bar.”

2 So you can combine a “bra” and a “ket” to form a “bracket.” And now you have seen an example of Paul Dirac’s sense of humor. Yes, that really is how he came up with those names.

3 The Spectral Theorem actually only requires that A be normal; if A isn’t Hermitian, the eigenvalues may be complex. But we won’t need that fact.

Published in: on August 2, 2010 at 10:02  Comments (2)  


  1. Not using natural units?

    • Nope. I find it gets fairly confusing to do that in intro courses; better to make the various \hbar‘s and c’s explicit.

Comments are closed.

%d bloggers like this: