School of Computer Science

Computer Science 308-767: Parallel Programming

Gaussian Elimination on Shared Memory :

Gaussian elimination is a classical method for solving a matrix of linear equations of the form Ax = b. By performing elementary row operations, Gaussian elimination transforms the square matrix A into an equivalent upper-triangular matrix. Following transformation of A into upper triangular form, a back substitution phase solves for x. A high level overview of Gaussian elimination and back substitution can be found on Mathworld.

In this assignment, you will develop two parallel linear solvers that use Gaussian elimination with partial pivoting (partial pivoting is used to enhance numerical stability) to transform a dense n x n matrix into an upper-triangular one and then solve the resulting linear system by back substitution. Below is pseudocode written using Fortran 90 array notation for a sequential implementation of Gaussian elimination with partial pivoting.

```    inputs: a(n,n), b(n)
outputs: a(n,n), b(n) in echelon form

do j=1,n-1
ksave = maxloc(abs(a(j:n,j)))
k = ksave(1) + j-1
swap a(j,:) and a(k,:)
swap b(j) and b(k)
do k=j+1, n
m = a(k,j)/a(j,j)
a(k,:) = a(k,:) - m*a(j,:)
b(k) = b(k) - m*b(j)
enddo
enddo
```
You will use two different shared-memory programming models to write a parallel linear solver based on Gaussian elimination with partial pivoting. Each parallel solver implementation will accept an arguments n - the size of a matrix. The number of threads may be specified as an argument, through an environment variable, or as an argument to a launching script. Your programs will allocate an n x n matrix, fill it with random floating point numbers ranging from 1 to 1000, apply Gaussian elimination with partial pivoting to transform the matrix into an upper triangular one, and then apply back substitution to compute x. To check your answer, compute the square root of the sum of the squares of the residual vector (this sum is known as the L2-norm) computed as Ax-b. Print the value of the L2-norm of the residual. (It should be very small.)

The back substitution and verification steps need not be parallelized. Have your program separately time the Gaussian elimination phase by reading the real-time clock before and afterward and printing the difference.

The formal components of the assignment are listed below:

• Write a shared-memory parallel program that uses Pthreads to solve a linear system using Gaussian elimination with partial pivoting.

• Write a shared-memory parallel program that uses OpenMP to to solve a linear system using Gaussian elimination with partial pivoting.

• Write a document that describes how your programs work. This document should not include your programs, though it may include figures containing pseudo-code that sketch the key elements of your parallelization strategy for each implementation. Explain how your program partitions the data, work and exploits parallelism. Justify your implementation choices. Explain how the parallel work is synchronized.

I will provide a problem size n for you to evaluate your implementation. Prepare a table that includes your timing measurements for the elimination phase of your implementations on 1-14 processors of the SGI Origin mapy. Graph of the parallel efficiency of your solver executions. Plot a point for each of the solver executions. The x axis should show the number of processors. The y axis should show your measured parallel efficiency for the solver execution.

Your assignment should be submitted in two parts.
• A tar file containing a separate directory for each of your programs. Each directory should contain your source code and a Makefile.