School of Computer Science Computer Science 308767: Parallel ProgrammingGaussian Elimination on Shared Memory
:

In this assignment, you will develop two parallel linear solvers
that use Gaussian elimination
with partial pivoting (partial pivoting is used to enhance numerical
stability) to transform
a dense n x n matrix into an uppertriangular one
and then solve the resulting linear system by back substitution.
Below is pseudocode written using Fortran 90
array notation for a sequential implementation of Gaussian elimination
with partial pivoting.
inputs: a(n,n), b(n) outputs: a(n,n), b(n) in echelon form do j=1,n1 ksave = maxloc(abs(a(j:n,j))) k = ksave(1) + j1 swap a(j,:) and a(k,:) swap b(j) and b(k) do k=j+1, n m = a(k,j)/a(j,j) a(k,:) = a(k,:)  m*a(j,:) b(k) = b(k)  m*b(j) enddo enddoYou will use two different sharedmemory programming models to write a parallel linear solver based on Gaussian elimination with partial pivoting. Each parallel solver implementation will accept an arguments n  the size of a matrix. The number of threads may be specified as an argument, through an environment variable, or as an argument to a launching script. Your programs will allocate an n x n matrix, fill it with random floating point numbers ranging from 1 to 1000, apply Gaussian elimination with partial pivoting to transform the matrix into an upper triangular one, and then apply back substitution to compute x. To check your answer, compute the square root of the sum of the squares of the residual vector (this sum is known as the L2norm) computed as Axb. Print the value of the L2norm of the residual. (It should be very small.)
The back substitution and verification steps need not be parallelized. Have your program separately time the Gaussian elimination phase by reading the realtime clock before and afterward and printing the difference.
The formal components of the assignment are listed below:
I will provide a problem size n for you to evaluate your implementation. Prepare a table that includes your timing measurements for the elimination phase of your implementations on 114 processors of the SGI Origin mapy. Graph of the parallel efficiency of your solver executions. Plot a point for each of the solver executions. The x axis should show the number of processors. The y axis should show your measured parallel efficiency for the solver execution.