« The vOICe Home Page
U.S. Patent 5097326

Let's make Vision Accessible

Introduction and Overview

Scanning animation
Scanning for soundscape
Meijer's system [...] consists of a video camera that takes a picture which is converted into a digitised image made up of 64 by 64 pixels. But then the image is converted into sounds by a computer, following two simple rules. First, pixels situated ``high'' in the picture are converted into high tones; those that are low are converted into low tones. Secondly, the brighter the pixel, the louder the sound. So a bright pixel near the top of the pixel grid would be high-pitched and loud.
If you were to ``hear'' a picture with Meijer's device, you wouldn't hear the whole image instantly: rather you would hear a column at a time, from left to right. A bright, diagonal line stretching upward to the right produces a loud ``ooiieep'' sound and another stretching downward to the right makes the opposite sound - ``eeiioop.'' After one entire scan, which takes about a second, the scan begins again. If the image changes, so will the next pattern of sound.

[...] ``Blind people have their cane, which is a very useful thing,'' says Meijer. ``But they don't have the ability to detect buildings from a distance, or to recognize buildings they have encountered before. I hope a system like this would help with orientation in particular.''

Rosie Mestel, New Scientist, June 4, 1994, pp. 20-23.

Background

In recent years, many new approaches have been developed to overcome limitations of the human senses. Significant progress has been made in developing technology to support reading and writing by the blind, including new approaches to access computers having graphical user interfaces. Unfortunately, much less progress has been made in the area of orientation and mobility for the blind, in spite of numerous attempts to develop electronic travel aids (ETA's) and vision substitution devices. Existing ETA's are basically obstacle detectors that support blind travel, but orientation and mobility, as well as many other vision related activities, might also be addressed by an auditory display device that really converts arbitrary images into sounds:


Blind person forms mental image from soundscape of arbitrary image
Synthetic vision by hearing sonified pictures

Example sounds: Because of the generality of the approach, applications can be as varied as orientation with respect to a wall with a gate or reading a purely graphical oscilloscope display, or hearing the plots of a scientific graphing calculator. On this site you can also find sound samples in MP3 format, further illustrating visual orientation, hearing a printed graph, a parked car, the US flag, the planet Saturn or an Access Symbol, and even watching television.

The vOICe for Windows

Auditory display for synthetic vision

Hardware prototype
The vOICe hardware prototype
Originally, a hardware prototype, nick-named The vOICe (OIC? Oh I see!), was developed to prove the technical feasibility of the underlying concepts. It starts by subdividing an electronic photograph into 64 rows and 64 columns, giving 4096 pixels. Shading is reduced to 16 levels of grey.

Column by column, the image is then translated into sound. The top pixel in a column gives a high pitch, and the bottom pixel a low pitch. An intermediate position gives an intermediate pitch. The grey-level is expressed by loudness. All pixels in a single column are heard simultaneously, much like a musical chord. With subsequent image columns, these chords change according to the pixel brightness distribution within the column. In this way, the image content is translated into sound by scanning through all 64 columns, from left to right. Finally, a click marks the beginning of a new image. Typically, fresh electronic photographs are taken and converted into sound at one-second intervals. Of course, there is a lot more to be said about technical issues relating to this general image-to-sound mapping for artificial vision.

The vOICe for Windows
The vOICe software
Very important for a good mapping is the concept of preservation of simplicity and similarity: simple images should sound simple, and simple shifts in position should lead to correspondingly simple perceptual changes in the sounds. Complexity should not arise from the mapping itself, but only from the complexity of the content of the image being mapped into sound! Furthermore, the mapping should not only allow one to distinguish different image sounds and to learn to associate a given set of auditory patterns with their visual counterparts, but also to analyze and generalize them for situations that were not in the training examples.

The vOICe hardware did not become available as a product. However, in the photograph on the left, a software version of The vOICe is being used, running on a fast Pentium notebook PC (inside the shoulder bag) with a PC camera and headphones. This software is available and now lets you experience The vOICe yourself using The vOICe for Windows for Microsoft Windows. Try it!

Show me more

This website contains a lot of condensed information, and depending on your interests you may prefer to read a scientific paper relating to the original hardware prototype, play with the on-line interactive demonstration or else learn more about the concepts, the possibilities and the limitations of seeing with sound.


Recent highlights and key events from the history of The vOICe are listed on the highlights and events page.

Low vision is now within reach, but...

A major training effort will be needed when trying to synesthetically interpret the complicated sounds resulting from normal visual images. To what extent people - especially blind people - can learn to perceive, comprehend and make use of the soundscapes is still an open question, with many options for exciting psychophysical research! The proposed sound-induced mental imagery can also be studied from the perspective of neuroscience, linguistics, or psychology.


Copyright © 1996 - 2024 Peter B.L. Meijer