Linear Algebra 2: Echelon Matrix Forms | by tenzin migmar (t9nz)

Row echelon form and reduced row echelon form

Preface

Welcome back to the second essay of my ongoing series on the basics of Linear Algebra, the foundational math behind machine learning. In my previous article, I introduced linear equations and systems, matrix notation, and row reduction operations. This article will walk through the echelon matrix forms: row echelon form and row reduced echelon form and how both can be used to solve linear systems. This article would best serve readers if read in accompaniment with Linear Algebra and Its Applications by David C. Lay, Steven R. Lay, and Judi J. McDonald. Consider this series as an external companion resource.

Feel free to share thoughts, questions, and critique.

Row Echelon Form

The Gauss Elimination method is a procedure to transform a matrix using row operations into a form in which solutions can become retrievable after some back-substitution.

As review, the row reduction operations are:

Replacement: “replace a row by the sum of itself and another row.”*
Interchange: “swap two rows.”*
Scaling: “multiply all entries in a row by a non-zero constant.”*

The above operations can be applied to a matrix to transform that matrix into its row echelon form. A given m x n matrix, where m is the number of rows and n is the number of columns is said to be in row echelon form when:

Any rows where all entries are zero are below rows where at least one entry is non-zero.
All leading entries of a row (first entry from the left that is non-zero) are in a column to the right of the row above it.
All entries in a column below a leading entry are zero.

The following are examples of matrices in row echelon form (REF).

Take a moment to appreciate how while there are variations in the size and entries of the matrices, all are considered to be in row echelon form as per the criteria outlined above. Notice the staircase-like pattern underneath the highlighted leading entries? That is what naturally emerges from performing Gaussian elimination to transform a matrix to row echelon form. The form is aptly named: the word echelon was derived from the French eschelon meaning the rung of a ladder and later on came to mean “step”.*

The basic idea behind Gaussian elimination to transform a matrix into a row reduced form is to pick a pivot (the word pivot is used to refer to the leading entry: an entry that will be the first non-zero entry in its row) and then eliminate all entries below, zeroing out everything in the column underneath the pivot. To understand why this step makes progress in transforming a matrix into reduced echelon form, revisit the definition of the reduced echelon form: In row echelon form, all entries in a column below a leading entry are zero. This step is then iterated through again for each row, but with caution! We must make sure that with each pivot selection, we are not violating one of the core characteristics of the row echelon form; all leading entries of a row are in a column to the right of the row above it. With this rule in mind, it is generally a good idea to begin pivoting around the first entry in the first row and then work your way down the rows from right to left.

Let’s once again contemplate the purpose of the row echelon form that was mentioned earlier: to transform a given matrix representing a linear system into a form in which solutions can become easily legible. To better understand the underlying utility of row echelon form, consider example 1.

When we perform the Gaussian elimination, we’re manipulating the matrix to take on a symmetric yet more decipherable form. With the row echelon form obtained from example 1, we can now use back-substitution to work our way up in obtaining each of the solutions.

As you can see from above, this isn’t ideal. It takes additional untidy work. Reduced row echelon also takes additional work, but the notation is cleaner leaving less room for error. Once we’ve reduced a matrix to reduced row echelon form, we can easily read off our solutions and we’ll have resolved the linear system.

Reduced Row Echelon Form

When reducing a matrix to the reduced row echelon form, the Gauss-Jordan Elimination is used. This algorithm transforms a given matrix representing a linear system into the reduced echelon form in which solutions can become easily legible through applying a series of row reduction operations. No additional back-substitution required.

A given m x n matrix is said to be in reduced row echelon form if it satisfies the prerequisites of row echelon form, and in addition, also meets the following criteria:

The leading entries in each row are one.
All entries in a column below and above a leading entry are zero. (The leading entry is the only non-zero entry in the column)

Let’s work through an example of row reducing a matrix to the reduced row echelon form.

Reading our reduced row echelon form matrix, it is now immediately obvious that our solutions are x₁ = -3, x₂ = -12, x₃ = -3.

Uniqueness of Echelon Forms

Up until now, we’ve computed one example each for the row echelon form and the reduced row echelon form. It is possible that you wanted to attempt the row reduction of the row echelon form independently as an exercise, only to end up with a different row echelon form matrix. Not to worry, this is very much possible and both versions are equally correct as long as the computations were performed correctly and all three rules were covered. This is a fantastic situation! It ushers us forward towards to an important theorem:

Theorem (1)

Matrices may have more than one row echelon form; the row echelon form is not unique. Different but equally valid echelon forms may be arrived at through variations in the sequences of how row operations are applied.

This is not the case for the reduced row echelon form, it is the converse for the reduced row echelon form.

Theorem (2)

Matrices must only have one reduced row echelon form; the reduced row echelon form is unique.

The root of why we see this difference in uniqueness between the two forms is due to the additional restrictions we enforce on the reduced row echelon form. Namely, the requirement around the leading entries being equal to one. As soon as we’ve reduced a matrix to the row echelon form, we could multiply each row by any non-zero constant and it would still be in row echelon form because scaling the matrix didn’t break any rules for the row echelon form. The same is not possible with the reduced echelon form as the leading entries must be one. I illustrate this point further below with a concrete example.

Number of Solutions

A fundamental question that naturally arises from solving a linear system is how many solutions exist? For any linear system, the resolution will always be one of three cases. The linear system will either have one unique solution, infinite solutions, or no solutions. If you’re interested in mulling over why it must be one of these three, (re)visit my previous article.

Let’s take a closer look at each case in greater detail to see how we can recognize the solution case of a given matrix, and gently poke around at and explore the intuition behind exactly why and how each case scenario manifests itself.

Unique Solution: a linear system has a unique solution when the reduced row echelon form of its matrix has a pivot for every column.

It becomes more obvious why this is the case when we rewrite the matrix form as a series of linear equations. We see that because each column has a pivot (with no entries above or below), so there is a clear-cut solution for each variable you can read off equation by equation.

No Solutions: a linear system has no solutions when the reduced row echelon form of its matrix has an algebraic inconsistency.

As seen above, there are no values of x₁, x₂, x₃, and x₄ that will allow equation four to be true. The left hand side will always be 0 which does not equal nine, therefore no solutions exist for this system. In general, any augmented matrix with a row [0, 0, … 0 | b] where b is non-zero will have no solutions because 0 ≠ b.

Infinite Solutions: a linear system has infinite solutions when it has at least one free variable. A free variable occurs when the corresponding column does not have a pivot. On the other hand, a basic variable is a variable where the corresponding column has a pivot. Let’s examine why the presence of free variables hints at infinite solutions.

True to its name, free variables mean that you are free to assign any value to them. All basic variables are defined relative to free variables, so the values of basic variables will be dependent on what value the free variables were assigned. This is the essence of the existence of infinite solutions; infinitely many solutions are valid as long as the basic variables are consistent with the values selected for the free variables.

After transforming a matrix to the reduced row echelon form, it will become immediately apparent whether a system has a one, none or infinitely many solutions.

Summary

In this chapter, we learned:

The Gauss Elimination method for reducing a matrix to the row echelon form to solve a linear system.
The Gauss-Jordan Elimination method for reducing a matrix to the reduced row echelon form to solve a linear system.
Uniqueness of the echelon forms: the row echelon form is not unique while the reduced row echelon form is.
The number of solutions a linear system may have: unique, infinite or none and when and why they occur.