The vOICe to AIM

« The vOICe Home Page

It would be very interesting to learn more about the processing of The vOICe complex sounds by the human hearing system. The vOICe goes well beyond the use of the spectral envelope to trace out (the edges of) object shapes, and seeks to induce a full spectral-temporal integration by the human brain to arrive at a form of mental synthetic vision.

One (and ultimately decisive) approach is to perform direct psychoacoustic experiments with human volunteers, involving all the intricacies of dealing with human variability. Another approach is to make use of computer models developed for the simulation of the various stages in human auditory processing. Again, these models should themselves have been validated against measured human performance during their development, such that we may have some confidence in their accuracy and/or predictive value. Given these premises, using auditory models at least has the advantage of quickly giving a first impression of what-to-expect, thus helping to reduce the risks before deciding whether to embark on a much more elaborate (time-consuming, and expensive) evaluation with human subjects.

Roy Patterson and colleagues at the Centre for the Neural Basis of Hearing in the Physiology Department of the University of Cambridge, UK, have developed a time-domain model of auditory processing to simulate the auditory images produced by complex sounds. This  Auditory Image Model (AIM, formerly available from the  University of Cambridge) simulates how sound waves cause basilar membrane motion (BMM) in the human cochlea. Subsequently, the results are used as input for a next stage in which the conversion, by haircells, of motion into a neural activity pattern (NAP) in the auditory nerve is simulated. Finally, AIM maps the neural activity pattern to an auditory image of the sound. On this page, we will consider how the first two stages of AIM respond to The vOICe's soundscapes. The  original AIM software package is available for download, and all other necessary information to reproduce the following experiments is available on this site.

DSAM ( Development System for Auditory Modelling) contains a more recent implementation of the AIM model for various computer platforms, including Microsoft Windows. DSAM is available from the Department of Psychology at the University of Essex, UK.

From image to sound and back

As a testbench, a sound file was first created from the 64 × 64 pixel arti2.gif source image shown below (left image), using The vOICe Java application to generate a 1.05s 20 kHz sample rate 16-bit mono .wav file arti2ear.wav. An exponential frequency distribution was used for 64 frequencies in the range [500Hz, 5kHz]. Apart from this frequency range and the 20 kHz sample rate, all parameters had their default settings. Moreover, The vOICe Java application cannot only generate a sound file from an image file, as described above, but also reconstruct a spectrographic image from a sound file, as shown on the right:

Source image
Reconstruction from sound
The vOICe source image arti2.gif
The vOICe reconstruction from arti2ear.wav

The 64 × 64 pixel Fourier reconstruction was created by The vOICe Java application from the arti2ear.wav soundscape file, again using 64 frequencies exponentially distributed in the range [500Hz, 5kHz]. It should be stressed that this reconstruction is used here only to prove that much of the original image information of the arti2.gif image is indeed still present in the soundscape - or else the spectrographic reconstruction from the sound file would have failed to be visually recognizable. The reconstruction does not intend or pretend to model the processing by the human hearing system (although a windowed Fourier reconstruction is not entirely unlike the cochlear frequency-to-place mapping). For an example of a reconstruction from a sonified real-life image sequence see the animation reconstruction experiment. Fourier-type spectral reconstructions can also be done with the shareware program  Cool Edit by loading a .wav file and selecting its menu item View | Spectral View. Again you will obtain results similar to The vOICe reconstruction as shown above, although for best results one needs a logarithmic frequency scale such as used by default in The vOICe Java application.

Human hearing model

Now the interesting question arises to what extent the human hearing system would further preserve the image information that was clearly still preserved in the sound representation. Although The vOICe image-to-sound mapping has been designed with a range of human factors in mind (like available auditory bandwith, critical bands, JND and typical environmental time constants), we will push a bit further and try to make some of these aspects more concrete and explicit by means of a computer implementation of an auditory model, in this case AIM. Specifically, to get some idea of what might happen to the sound in the early stages of human auditory processing, the arti2ear.wav sound file was in the following experiments used as input for the AIM software package, release aimR8.2 (May 1997). Of course, the arti2ear.wav sound file may serve as a benchmark data set for other auditory modelling programs as well.

First, we will show so-called ``auditory spectrograms'' as generated by AIM using the gensgm module. The gensgm module of the AIM software performs a time-domain spectral analysis using a bank of auditory filters, and summarises the results in an ``auditory spectrogram,'' i.e., a spectrogram with auditory frequency resolution and temporal resolution instead of the regular Fourier-based spectrograms commonly used in speech analysis (and in the reconstruction given above). The auditory spectrogram is a plot of a sequence of spectral slices extracted from the envelope of the basilar membrane motion. Contrary to The vOICe convention, gensgm represents loudness by darkness rather than brightness. In other words, one will get negative images from the AIM auditory spectrograms of The vOICe's soundscapes:

AIM auditory spectrogram
AIM auditory spectrogram
Two AIM auditory spectrograms of The vOICe's soundscape arti2ear.wav
Positive AIM spectrogram
Positive AIM spectrogram
Inverted greyscale for the above AIM auditory spectrograms

The AIM auditory spectrogram at the top left was generated via

gensgm -width_win=512 -height_win=512 -mincf_afb=500 -maxcf_afb=5000 -gain_gtf=0.05 arti2ear.wav
and the one at the top right has the default logarithmic compression turned off through the compress and rectify options in
gensgm -width_win=512 -height_win=512 -mincf_afb=500 -maxcf_afb=5000 -gain_gtf=1 -compress=off -rectify=on arti2ear.wav
while the 512 × 512 pixel spectrographic images, specified by the display options width_win and height_win, were afterwards sub-sampled by a factor two to obtain the 256 × 256 pixel images shown here. The relevant [500Hz, 5kHz] frequency display range is selected using the mincf_afb and maxcf_afb options, while a 20 kHz sample rate is the AIM default. The gain_gtf option sets the output gain such that the spectrogram is neither too dark nor too bright, because there is no automatic brightness scaling in gensgm. The AIM software requires an additional platform-dependent swap_wave option on some types of computer platform to deal with different (big-endian) byte ordering conventions.

Basilar membrane motion (BMM) as a function of time can also be shown by using the AIM genbmm module as shown below:

Basilar membrane motion
Basilar membrane motion according to AIM

which was obtained through

genbmm -length_wave=1050 -width_win=512 -height_win=512 -mincf_afb=500 -maxcf_afb=5000 -gain_gtf=0.005 arti2ear.wav
where the length_wave option sets the sound sample length to 1.05s, expressed as the number of milliseconds.

The neural activity pattern (NAP) in the auditory nerve can also be shown as a function of time by using the AIM gennap module as shown below:

Neural activity pattern
Neural activity pattern according to AIM

which was obtained through

gennap -length_wave=1050 -width_win=512 -height_win=512 -mincf_afb=500 -maxcf_afb=5000 -gain_gtf=0.005 arti2ear.wav
using exactly the same options as with the genbmm run.

It is clear that if these results are approximately correct, and if the brain can (learn to) properly handle and interpret these firing patterns up to and including the higher cognitive centres, we are heading for something really exciting.

It should be stressed that there is still little consensus in the auditory research community about the (predictive, perceptual, and other) merits of any contemporary auditory model. Therefore, the above results with AIM should only be viewed as a first draft experiment for looking at some possible effects within the first stages in auditory processing using The vOICe's soundscapes. It is not known to what extent this bears any resemblance to reality, apart from one's own subjective impressions of perceived auditory resolution - therefore you had better listen to the soundscape yourself to make your own judgements. Auditory researchers and neuroscientists are welcome to make comments, propose improvements and either validate or invalidate the provisional results as presented here.


Meijer, P.B.L., ``An Experimental System for Auditory Image Representations,'' IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992. Reprinted in the 1993 IMIA Yearbook of Medical Informatics, pp. 291-300. Electronic version available on-line.

Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C. and Allerhand M. ``Complex Sounds and Auditory Images,'' Advances in the Biosciences, Vol. 83, pp. 429-446, 1992.

Patterson, R.D., Allerhand, M., and Giguere, C., ``Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform,'' Journal of the Acoustical Society of America, Vol. 98, pp. 1890-1894, 1995. Text available  on-line.

Zatorre, R.J. and Belin, P. ``Spectral and temporal processing in human auditory cortex,'' Cerebral Cortex, Vol. 11, pp. 946-953, 2001. Abstract available  on-line.

For other useful literature, see another list of references, or check out some of the technical aspects of the frequency-time uncertainty principle.

Copyright © 1996 - 2024 Peter B.L. Meijer