Alternative Image Representations

« The vOICe Home Page

For any grey scale image, one can make a corresponding 3D plot of brightness as a function of position. Apart from the typical 3D hiding and rendering effects, such a plot is equivalent to its original. However, changing the representation through the replacement of brightness by height can have dramatic consequences for our (sighted) human ability to immediately recognize things! In the next two images, 3D plots are shown for sound spectrograms of sounds generated by The vOICe (1.05s sound, linear frequency distribution). In the upper right corner the original image from which the 3D plot was derived is given for comparison. Although the regular 2D spectrograms are easily interpreted by the sighted, recognition becomes much harder with the associated 3D plots. This holds in spite of the fact that for these particular images brightness was not only mapped into height, but also preserved in giving higher peaks a lighter shade of grey - yielding a kind of ``snowy mountain'' effect.

3D face spectrogram
3D spectrogram of human face.

3D car spectrogram
3D spectrogram of parked car.

Apparently, our human ability to quickly recognize things depends not only on the available information, but also on its representation. Recognition can become difficult even when the alternative representation is by itself a simple or common one (here a mountainous landscape), if it is abused for representing other kinds of information (modalities) than we are used to. Again, the question arises whether one can learn to understand these alternative representations of basically the same information as easily as, say, learning to read upside down after already having learnt to read in the normal way.

Still, it is important to realize that even the classic spectrogram can itself be viewed as an attempt to design a useful artificial cross-modal mapping from the auditory to the visual domain, albeit mostly for research purposes.

In the following example for the sighted, brightness is mapped into a 3D plot without any object hiding effects, and without helping recognition via the snowy mountain effect: a 3D random dot stereogram is given together with the photograph from which it was derived. Contrary to the regular use of autostereograms, the brightness values of a normal photograph were here directly mapped into height - instead of the usual application of special image preprocessing to first map depth information into brightness. You will notice that it is very difficult to recognize a face in the autostereogram, while the exclamation ``Hi!'' is easily read at an elevated plane (assuming that you are already familiar with 3D viewing of ordinary autostereograms).

Photograph of human face
Photograph of human face.

Random dot autostereogram
3D random dot autostereogram for human face.

The moral of these examples is to demonstrate that it can be tricky to make general statements about our human abilities to deal with alternative perceptual representations. Even if these representations correspond to simple mappings, it may turn out to be very tough to (learn to) interpret them. On the other hand, the human brain has tremendous potential for learning, if slowly: written text is clearly an alternative representation for speech, which we have proven to be able to master after a number of years of training at school! Would people spend all that effort unless predecessors had already shown in practice that it could be done?

Now we need to take the challenge of trying to master the soundscape representations of images as generated via The vOICe mapping. Can it be done? There is no way to find out without giving it a serious, persevering try!

Analogous live immersive experiments were done by George Stratton in the 1890s in trying to adapt to an upside-down world through wearing upside-down glasses, and by Stuart Anstis in the 1990s in trying to adapt to a "negative world" where color space was inverted.


Love ends where unquestionable belief begins.

Copyright © 1996 - 2024 Peter B.L. Meijer