Mathematical explanation behind a picture posted (lifted from facebook)

In this image given below, there is an actor’s (famous south Indian actor Rajinikanth) image which can be seen only if you shake your head ! I had lifted this from Facebook.

I am just curious to know if there is any mathematical explanation for it. Is there any way to know how this image was created in the first place.

PS : If this question (although interesting) is inappropriate here, it is still okay and i hope it could be migrated to some stackexchange site.

(…experiment to prove that it is a physical phenomenon)

After some comments expressing doubt as to whether this is a physical phenomenon, I have done a small experiment using a simple camera. I have have shot photos of the picture displayed on a LCD monitor in two different cases. In case-1, the camera was still and in case-2 the camera was shaking in a circular arc (in to and fro) about and axis in the vertical plane passing through the centre of the camera. (just as like we shake our head). I did it with hand just as we shake our head.
I have given the photos below.

Case-1 (still camera)

Case-2 (shaking camera)

It can be observed that the face of Rajini is more clearly visible in Case-2 (shaking camera) than in the Case-1 (still camera) where the face is not clearly visible.

PS : Now there is really no need to shake our head.

after a recommendation by Willie, here, I have added Case-3 where the camera is shaking vertically (parallel to the stripes in the picture).

It can be observed that not much of an effect there when camera is shaking parallel to the stripes.

Case-3 (shaking camera vertically)

What you are seeing is a physical manifestation of the mathematical operation known as the convolution.

First let me show you some pictures; we’ll get into the mathematics afterwards. We start with the original

I take the image, desaturated the colours, and duplicated another layer, and pixel-wise added the layers after some translation. With a horizontal translation that is half the “wavelength” of the black bars, we get

With a translation of the same number of pixels, but vertically, we get

and finally, a diagonal translation at -45 degrees.

So what is going on? Why did I say that this is a manifestation of convolution?

Recall that the convolution of two functions defined on (say) the real line $\mathbb{R}$ is defined to be

$$f * g (x) = \int_{\mathbb{R}} f(y) g(x-y) dy$$

In a course in Fourier analysis, one is taught to emphasize that this is the dual operation of multiplication. That is, convolution in physical space corresponds to (point-wise) multiplication in Fourier space. This immediately gives the following interpretation of a convolution in signal processing:

Convolving a signal $f$ by a function $\psi$ is the same as applying a frequency dependent filter $\hat{\psi}$ to the signal $f$.

Another way of looking at the convolution, however, after staring at the above definition for a bit, is that

A convolution is a way of taking weighted average of a signal with its translates. The weight depends on the amount of translation.

It is in this second sense that we will first look at the phenomenon you asked above. In the second image of this post, I averaged the signal with its translation horizontally by half the wavelength of the black bars. Hence this is a convolution. Similarly, in the third/fourth image of this post, I averaged the original with a vertical/diagonal translation. They are also convolutions. And you see that this reproduces the observation you made that the direction in which you shake your head/camera produces an effect on the image seen/captured.

So how is the process of shaking your head of shaking a camera a process of convolution? The idea is that the image you see with your eyes and you capture with a camera do not come from photons all emitted at the same instant in time (special relativity notwithstanding). In your vision, there is the well-known phenomenon of persistence of vision which posits that the perceived image is actually made up of photons arriving in a 40 millisecond interval. Similarly, the shutter-speed of a camera determines how long a camera registers light, and so a camera set on 1/25 for the shutter-speed will “open its eye” for 40 milliseconds, and the image registered on the CCD or on film will be photons arriving in that window.

Now, if you shake your head or camera so that the retina or the CCD or the film moves significantly during that 40 milliseconds, each of your retina cell, each of the photoelements on the CCD, or each of the dye pigments on the film will be exposed to photons originating from different spatial positions. (I am grossly simplifying here, but that’s the moral of the story.)

To summarise: your eyes and cameras already take convolution of the incoming signal in time when they compose the image. By shaking the apparati you convert the temporal convolution to a spatial convolution. Which means that you are taking a weighted average of the image and its spatial translations, which is why what you see and capture on camera can be analogously described by digitally manipulating the image via an averaging/convolution procedure.

Note that this corresponds somewhat with Henning’s comment to your question. The “eye’s edge detection” he mentions is, roughly speaking, a description of how the eye is sensitive to different spatial frequencies of a signal (not to be confused with the actual electromagnetic frequencies with determines the colour). By shaking your head you apply a convolution operator, which in frequency space introduces a cut-off for high spatial frequency components. Buy reducing the high spatial frequency components, your eye is forced to get its information from the lower-frequency components in which the image of the Indian Actor hide. (There’s some technical inaccuracies in this paragraph about how human physiology works and how it interacts with the shaking of the head, but I think this simpler picture illustrates the idea better.)

At this point I should mention that the idea of taking spatial convolutions of images and the exchange between temporal and spatial convolutions with the motion of the camera is not only useful for optical illusions. It actually has industry application in automatic image deblurring.