The goal is to recover the 3D depth from a 2D image by applying the "SHAPE_FROM_SAHDING" algorithm. There are three parts to carry on in this implementation.
Design Choice: Filter out the background for all the above computation. Since the background doesn't contribute any information to the image.
Testing Result:
The results is fairly consistant for the images like Lemon and Boat. Although the tilt has 20 deg. variation, this is still a good approximation. Show Lemon below as example.The number order is: [albedo, tilt(deg), slant(deg)].
 
![]() | ![]() | ![]() |
However, for the cases like saw and flash, the tilt has big variation. Show saw below to demonstrate the result.
 
![]() | ![]() | ![]() | ![]() |
For the case like Brick, the albedo has big variation, as well tilt.
 
![]() | ![]() | ![]() |
Conclution:
The algorithm works well under the condition that the surface of the object is lambertian,the surface normal distributed uniformly in 3D, and the shape of the object relative to the light direction doesn't cause self-shading, which is basically the assumption of the algorithm. This is also the reason that the Lemon images can give relatively consistant result. Since it's shape and surface are close to the assumption.
For the saw case, the corn shape is very sensitive to the light direction, which cause big variation of the gradient of the intensity, therefore, tilt has big change in value.
For the case like the Brick, although each of it's surface are flat, which can give good chance to an accurate estimation, the global shape will cause intensity vary a lot from surface to surface. Therefore, self-shading is especially serious in this case. For example, the 3rd image, 2 out 3 of the surfaces are in shadow. This makes the result of this image fairly inconsistant.
One interesting point is regardless of the inconsistancy of the estimation, each of them can give fairly good result when applied to recover the depth of each corresponding image.
where albedo, tilt and slant is estimated from the first stage, p, q are surface normal, which we sample them over the hemisphere.
p and q are the horizontal and vertical direction of the map respectively.
Theoretically, p and q can go from minus infinity to positive infinity. However, we need to make a reasonable sized table, so we need to balance between accuracy and the table size. Consider all these factors, I choose the reflectance table size as 128*128, which means we sample 3 degree per table cell, therefore, sample p and q from -18 to 18 sounds could be a reasonable approach, since tan(87)=19. If we leave the solution like this, and plot the reflectance map, we can see a large portion of the map is dark. Because the sample we choose is proportional to tan(surface_normal) instead of surface normal, as tan(79)=5, which implies we meanly sample around the bottom ring of the hemisphere. Obviously, this is a bad sample. So, I choose -5 to 5 as the reflectance map range to approach the accuracy of the sampling. As tan(79)=5, this means we sample the hemisphere from -80 to 80 degree, which is still the main part of the hemisphere. The final result about the recovered depth shows this is a good choice.
This part is done by applying the "SHAPE_FROM_SHADING" algorithm. The backbone of the algorithm is the iteration between the "Update Rule" and "Enforcement Integrability".
The results are perfect for many of the testing images. Here are several outputs to show the results.
 
Lemon
 
![]() | ![]() | ![]() |
note: about the name the images, the first number indicates the iteration times and the second number indicates the Lambda.
We can see that after 1000 iteration, the brightest part of the Lemon match to the highest part of the depth map(LemonA_1000_1000), and the shadow part is in depression of the depth map.
Moreover, we can see that more iterations will make the depth map more prescise.
 
  Brick
 
![]() | ![]() | ![]() |
This example is to demonstrate the lambda parameter. We can see the third image, with lambda = 500, it's edge is more obvious than the second image, which is with lambda = 1000. The result is consistant with the smoothness constraints, e.g. the larger the lambda, the smoother the surface.
 
  Saw
 
![]() | ![]() | ![]() |
This is another example to demonstrate that more iterations give more accurate result. We can see although 200 iteration can show the shape of the image, however, there are still depression on the corns. The 1000 times result shows the depression is left up much more.We can expect a precise result after 2000 iteration. The lambda parameter is set to 500 always, since the saw image is with smooth shape, the lambda parameter does not affect the result the result that much.
 
  Face
 
![]() | ![]() | ![]() |
I tried different parameter to recover this image, the result shows that set lambda = 2000 can give the best result.The images show above is 1000 iteration, from 2 different perspectives.
 
  Rod
 
![]() | ![]() | ![]() |
These images are after 200 iterations and lambda is set as 1000. We can see the algorithm can only roughly recover the shape of the rod. We can imagine that the algorithm will fail on the objects with complicated shape or their surface is not lambertian.
 
  Tips