Computer Vision

Shape From Shading Report

The goal is to recover the 3D depth from a 2D image by applying the "SHAPE_FROM_SAHDING" algorithm. There are three parts to carry on in this implementation.

Estimate albedo and the light source direction.
Build up the reflectance map.
Recover the 3D depth.

They are explained in detail as following.

Estimating albedo and light source direction

This part is achieved by applying the "APPRX_ALBEDO_ILLUM_FINDER" algorithm, which require to compute the following:

the average of the image intensity: ave(E), e.g. the sum of the image intensity divide by the number of pixels.
the average of the squre of the image intensity: ave(E*E), e.g. the sum of the squre of the image intensity divide by the number of the pixels.
the gamma: which is: sqtr(6*PI*PI*ave(E*E) - 48*ave(E)*ave(E));
the average of the image spatial gradient:
- [E_x]: which is the sum over the gradient of each pixel in the horizontal direction, e.g. ((E_right - E_left)/2) / sqrt([E_x]*[E_x] + [E_y]*[E_y]).
- [E_y]: which is the sum over the gradient of each pixel in the vertical direction, e.g. ((E_down - E_up)/2) / sqrt([E_x]*[E_x] + [E_y][E_y]).

After went though each pixel of the image and got these numbers, we can estimate:

ALBEDO: gamma / PI.
TILT: 4*arv(E) / gamma.
SLANT: [E_y] / [E_x]

Design Choice: Filter out the background for all the above computation. Since the background doesn't contribute any information to the image.

Testing Result:

The results is fairly consistant for the images like Lemon and Boat. Although the tilt has 20 deg. variation, this is still a good approximation. Show Lemon below as example.The number order is: [albedo, tilt(deg), slant(deg)].

[147.4, 86.7, 51.0]	[138.4, 79.5, 47.0]	[145.7, 63.6, 45.3]

However, for the cases like saw and flash, the tilt has big variation. Show saw below to demonstrate the result.

[148.0, 81.1, 64.4]	[145.1, -61.6, 64.2]	[145.9, -36.9, 61.5]	[148.5, -12.4, 56.3]

For the case like Brick, the albedo has big variation, as well tilt.

[100.6, 2.6, 27.3]	[147.8, 37.8, 44.9]	[178.3, -38.2, 44.6]

Conclution:

The algorithm works well under the condition that the surface of the object is lambertian,the surface normal distributed uniformly in 3D, and the shape of the object relative to the light direction doesn't cause self-shading, which is basically the assumption of the algorithm. This is also the reason that the Lemon images can give relatively consistant result. Since it's shape and surface are close to the assumption.
For the saw case, the corn shape is very sensitive to the light direction, which cause big variation of the gradient of the intensity, therefore, tilt has big change in value.
For the case like the Brick, although each of it's surface are flat, which can give good chance to an accurate estimation, the global shape will cause intensity vary a lot from surface to surface. Therefore, self-shading is especially serious in this case. For example, the 3rd image, 2 out 3 of the surfaces are in shadow. This makes the result of this image fairly inconsistant.

One interesting point is regardless of the inconsistancy of the estimation, each of them can give fairly good result when applied to recover the depth of each corresponding image.

Reflectance Map

The reflectance map functions as a look up table, which can greatly improve the efficiency of the program. Since the algorithm run hundred to thousand times, for each time and for each pixel, instead of computing the R, we look up from the pre-computed table. This will dramatically improve the speed.
the reflectance map is computed by:

R(p,q) = albedo*(p*cos(tilt)*sin(slant)+q*sin(tilt)*sin(slant)+cos(slant))/sqrt(p*p + q*q + 1)

where albedo, tilt and slant is estimated from the first stage, p, q are surface normal, which we sample them over the hemisphere.

p and q are the horizontal and vertical direction of the map respectively.

Theoretically, p and q can go from minus infinity to positive infinity. However, we need to make a reasonable sized table, so we need to balance between accuracy and the table size. Consider all these factors, I choose the reflectance table size as 128*128, which means we sample 3 degree per table cell, therefore, sample p and q from -18 to 18 sounds could be a reasonable approach, since tan(87)=19. If we leave the solution like this, and plot the reflectance map, we can see a large portion of the map is dark. Because the sample we choose is proportional to tan(surface_normal) instead of surface normal, as tan(79)=5, which implies we meanly sample around the bottom ring of the hemisphere. Obviously, this is a bad sample. So, I choose -5 to 5 as the reflectance map range to approach the accuracy of the sampling. As tan(79)=5, this means we sample the hemisphere from -80 to 80 degree, which is still the main part of the hemisphere. The final result about the recovered depth shows this is a good choice.

Shape From Shading

This part is done by applying the "SHAPE_FROM_SHADING" algorithm. The backbone of the algorithm is the iteration between the "Update Rule" and "Enforcement Integrability".

Update Rule: this rule is defined by the formula (9.24) and (9.25) in the book. There are several key points need to pay careful attention in order to make the program work properly.
- double buffering: the updataing involve the average of the surrending p,q value, which we need to pre_compute them. We can't do them on the fly because the left and up pixels will always be updated before we reach the pixel.
- gradient of the reflectance map: after we compute the difference between two neighbour pixels, the difference should be divide by 2 times the increment of the p, q value in the reflectance map, instead of 2. Another important point is we need to normalize the gradient, otherwise, the surface is not smooth.
- the R value: is found by index the p, q value of the current pixel to the reflectance map, then look up the value in the reflectance map. This is the whole use of the reflectance map.
- I enforce the imagenary part of the p and q to be zero before send them to the FFT.
Enforcing Integrability: the idea of doing this is because p = dZ/dx, q = dZ/dy, therefore, dp/dy = dq/dx. By taking Fourier Transform of p and q, bring them to the frequency domain, we can calculate the cofficient of z in frequency domain by the formula (9.27). Then calculate p and q by taking derivative of z, which is formula (9.28), (9.29).
- The FFT, I download from this webpage. The key point to make this work is to shuffle the data. Because the FFT algorithm uses the twiddle factor. So, when the data is in the frequency domain, the zero frequency is around the four corner. In order to calculate the cofficient of z properly, we need bring the 0 frequency to the center. After we got the cofficient of z, we take derivative of them, before we take the inverse of the Fourier transform, we need shuffle the data again, to put them back to the old order. Another key point is the frequency of this FFT is from -PI to PI.
- Program design: use 3 2D array to hold the p, q and z value. Computation follows the formula. Fourier transform is only apply on p and q during the iteration. After the whole iterations are done, apply the inverse of the Fourier on the z value to get the depth.

Testing Result

The results are perfect for many of the testing images. Here are several outputs to show the results.

Lemon

LemonA	LemonA_200_1000	LemonA_1000_1000

note: about the name the images, the first number indicates the iteration times and the second number indicates the Lambda.

We can see that after 1000 iteration, the brightest part of the Lemon match to the highest part of the depth map(LemonA_1000_1000), and the shadow part is in depression of the depth map.
Moreover, we can see that more iterations will make the depth map more prescise.

Brick

BrickA	BrickA_1000_1000	BrickA_1000_500

This example is to demonstrate the lambda parameter. We can see the third image, with lambda = 500, it's edge is more obvious than the second image, which is with lambda = 1000. The result is consistant with the smoothness constraints, e.g. the larger the lambda, the smoother the surface.

Saw

Saw1	Saw1_200_500	Saw1_1000_500

This is another example to demonstrate that more iterations give more accurate result. We can see although 200 iteration can show the shape of the image, however, there are still depression on the corns. The 1000 times result shows the depression is left up much more.We can expect a precise result after 2000 iteration. The lambda parameter is set to 500 always, since the saw image is with smooth shape, the lambda parameter does not affect the result the result that much.

Face

Face	Face1000_2000	Face_1000_2000

I tried different parameter to recover this image, the result shows that set lambda = 2000 can give the best result.The images show above is 1000 iteration, from 2 different perspectives.

The Limitation of the Algorithm

Although the algorithm works well for the above examples, the assumptions it made give it certain limitations. The rod image is an example to demonstrate this point.

Rod

Rod	Rod_200_1000	Rod_200_1000_flat

These images are after 200 iterations and lambda is set as 1000. We can see the algorithm can only roughly recover the shape of the rod. We can imagine that the algorithm will fail on the objects with complicated shape or their surface is not lambertian.

Tips

The initialization of the p and q doesn't affect the result. Even after one iteration, the shape shows. So we can feel free to initialize them as either way.
Set background surface normal to be zero and never updated them. This will improve the result.
The smoothness constrains is an important affect.If we set lambda big, we will get a really smooth surface, but might lose the fedility to the real shape. If we set lambda small, we might get a very irregular surface. So the lambda we choose is depending on what kind of surface we are working with.
Iteration is also another important factor. Although the rough shape can be shown even after one iteration. In order to get fairly good result, we need run more than 1000 time. This point is more obvious on the saw example.