Artificial Synesthesia for Synthetic Vision: Designer Synesthesia?


« The vOICe Home Page

Artificial synesthesia (syn = together, and aisthesis = perception in Greek) is a deliberately evoked or induced sensory joining in which the real information of one sense is accompanied by a perception in another sense through the use of a cross-modal mapping device. It is also known as virtual synesthesia or synthetic synesthesia. The additional perception is regarded by the trained synesthete as real, often outside the body, instead of imagined in the mind's eye. Its reality and vividness are what makes artificial synesthesia so interesting in its violation of conventional perception. Synesthesia in general is also fascinating because logically it should have been a product of the human brain, where the evolutionary trend has been for increasing coordination, mutual consistency and perceptual robustness in the processing of different sensory inputs.

Here we paraphrased the definition of (spontaneous, or natural) synesthesia as given by  Richard Cytowic in his book ``Synesthesia: A Union of the Senses,'' which is about the rare persons that hear colors, taste shapes, or experience other curious sensory modality crossings, allegedly related to abnormal functioning of the hippocampus, one of the limbic structures in the brain. It has also been suggested that synesthesia constitutes a form of "supernormal integration" involving the posterior parietal cortex. The Russian composer Alexander Scriabin and Russian-born painter Wassily Kandinsky both pioneered artistic links between sight and sound, while they may have been synesthetes themselves. Russian mnemonist Solomon Shereshevskii, studied for decades by neuropsychologist Alexander Luria, appears to have used his natural synesthesia to memorize amazing amounts of data.

Induced Synesthesia?
``Pat Fletcher was one of the first users of the system almost 20 years after she had been blinded in an industrial accident. She describes the moment while trying to learn how to interpret the sounds, when she first saw a new soundscape: `For there in the middle of my study is what looked like a hologram image of the wall and the gate, and I thought: wow, this thing really works.' '' Source:  "The Frog who Croaked Blue: Synesthesia and the Mixing of the Senses" by Jamie Ward, page 22 (ISBN-13: 978-0415430142; published by Routledge, 2008).
Pat Fletcher watching her kitchen sink with The vOICe. The covert "spy camera" is inside her special video sunglasses, while she is hearing the soundscapes of the view via her stereo earpieces.

Blind user watching her sink

Photography: courtesy Barbara Schweizer

November 2000
Personal account received from Pat Fletcher after several months of wearing The vOICe:

``Two observations I do want to share with you. First as I said I wear the program daily. Well the other day I was again washing dishes. I had let the water out of the sink and turn to get a towel to dry my hands. Then when I turn back to rinse the sink I was stunned to see the sink in a " depth" like image. I stepped away from the sink and walked slowly up to it again to see if my mind was playing tricks on me. No the feeling of seeing depth in the sink bowl was still there. I remember when I was a kid of looking down into a well. It was like looking down a long tunnel till you saw the reflection of the water which always gave me the feeling of depth. Looking at the sink was like that. Trying to puzzle this new experience out I began walking around and noticing how things and rooms were sounding to me. I know from touch and " my mind's impressions" what my different rooms should look like but now I am standing in the doorways of the rooms in my house and sensing the depth of the room with my mind filling in the rest of the picture. Is it possible that the body and mind can get so used to the input from the sound scapes that you can sense depth? I am not saying I can feel this sensation while i am walking yet but it is there when I am standing and looking at things.

next funny thing which happened to me relates to the wearing of the voice. I just purchased a new portable CD player. To hear this particular player you must use headphones. I wanted to see what it would be like to hear the music as I did my house work. while standing still no problem. Then I began walking around.. Suddenly I was stumbling into walls and over tables. I could not believe how clumsy I was. Then it hit me. DA, Patty the CD is not giving you the information you have grown accustom to with the Voice program. I had to concentrate while wearing the headphone of the CD. I tell you it was like being blinded again. I could barely find anything in my house. What a difference !''

August 2002
In the context of discussing a brain implant and whether The vOICe gives "just sound":

Just sound?.... No, It is by far more, it is sight ! There IS true light preception generated by the vOICe. When I am not wearing the voice the light I perceive from a small slit in my left eye is a grey fog. When wearing the vOICe the image is light with all the little greys and blacks. Yet a definite light image. True it is not color but it is definitely like looking at a black and white TV show. The light generated is very white and clear then it erodes down the scale of color to the dark black. I don't really see adiffrence in this light as compaired to the "light phosphenes " they are talking about. Maybe it is one of those things you have to experience to understand. Yet light is light and color is color. So no matter the way it is generated it is the same for me!

October 2003
After reading the following excerpt from the October 7, 2003 BBC News article about The vOICe:

"Our assumption here is that the brain is ultimately not interested in the information 'carrier' (here sound) but only in the information 'content'," says Meijer. "After all, the signals in the optic nerve of a normally sighted person are also 'just' neural spiking patterns. What you think you 'see' is what your brain makes of all those firing patterns."

Hooray! This is the way it feels! The sight stimulated threw the use of the vOICe program becomes a natural way of seeing. The soundscape sounds over time are relegated to the subconscious "background" noise and what is left is a form of true and working black and white vision!!

However, a main drive for investigating artificial synesthesia (and cross-modal neuromodulation) is formed by the options it may provide for people with sensory disabilities like deafness and blindness, where a neural joining of senses can help in replacing one sense by the other: e.g., in seeing with your ears when using a device that maps images into sounds, or in hearing with your eyes when using a device that maps sounds into images. The former synesthetic vision is the main focus of this site. The use of a device to map one sensory information stream into an information stream for another sensory modality clearly distinguishes artificial synesthesia from spontaneous synesthesia or developmental synesthesia as well as from cross-modal associations in non-synesthetes for normal sensory inputs. In other words, we are interested in forms of learned synesthesia (acquired synesthesia) that might result from machine-generated crossmodal mappings, particularly in studying the feasibility of functionally relevant auditory-to-visual synesthesia for blind people - to allow for a non-invasive visual prosthesis via sound-induced visually meaningful percepts (detailed mental imagery, in contrast to natural synesthetes who usually report relatively simple or non-generic visual percepts for color, textures or shapes in association with certain sounds, with different synesthetes reporting different percepts). This can be viewed as a form of synthetic vision, as well as functional synesthesia, where mental images are synthesized by the brain in close correspondence to the visual information as conveyed through sound. The soundscapes thus serve to scaffold mental imagery. The subject can also be related to sound symbolism and artificial or induced phonesthesia (phonaesthesia), a cross-modal mapping where certain sounds become associated with certain meanings.

Still, translation of arbitrary images and pictures into sounds, converting any visual objects into auditory objects, forms a new rehabilitation technology with for the time being unknown prospects, as so far no one has either proved or disproved its practical use as a kind of artificial eye.

Clearly, most people already posses a weak form of sound-induced mental imagery: when someone speaks to you the phrase  ``Imagine a white parked car,'' what happens? You normally see some vague and ill-defined view of a parked car in your mind - in your mind's eye.

Now our aim is to have much tighter control over exactly what you see in your mind through
White parked car
White parked car
a more precise auditory encoding of visual information, such that the actual live view of a camera can be conveyed, while the visual experience should become more compelling through extensive use. This does not seem too far-fetched, because the human brain is normally perfectly capable of generating highly convincing visual views without input from the eyes, if only in certain mental states: dreams are a case in point, and specifically lucid dreams can have a visually very convincing realism for those who have had the experience. Hypnosis is another example, and prolonged sensory deprivation can also lead to visual hallucinations. Sensory deprivation at the onset of blindness is also likely a factor in the complex visual hallucinations associated with Charles Bonnet syndrome. Normal adults taking psychoactive substances like LSD, mescaline, ketamine, ayahuasca or salvia divinorum have reported strong synesthetic experiences, suggesting that everyone has the capacity and the neural connections needed for synesthesia, but that this does not normally become a conscious experience through inhibition. Synesthesia has also been associated with lowered concentrations of the neurotransmitter serotonin, or effects related to serotonin (S2a) receptors, either through drugs or natural conditions (Brang and Ramachandran, 2008). Synesthesia may thus not necessarily require a physical rewiring of the brain but perhaps "only" effectively modified neural connections much like the fixed POTS (plain old telephone) system allows for arbitrary ("new") connections through momentary changes in signal gating (dynamic routing). On the other hand, structural connectivity of the human cerebral cortex in the form of its connectome likely plays a significant role in synesthesia, while genetic factors are already known to play a significant role in synesthesia.

The oldest report on LSD and synesthesia stems from LSD discoverer Albert Hoffman (1906-2008) in his lab notes of April 19, 1943: "It was particularly striking how acoustic perceptions such as the noise of a passing auto, the noise of water gushing from a faucet or the spoken word, were transformed into optical illusions". In addition, there may be functional roles for changes in synaptic weights and dendritic spine structure as well as for neurogenesis and development of neural stem cells even in the adult brain when exposed to crossmodal mappings. In any case, the question arises to what extent one could learn to control normal intersensory inhibition of sound-induced imagery through immersive training for meaningful results, instead of resorting to psychedelic drugs as hallucinogens to "see sound". One could then seek to induce meaningful visual hallucinations of which the visual content is constrained by live camera input encoded in sound. What does it take to create that next experience?

Research question: one could wonder if co-activation of visual cortex with TMS (transcranial magnetic stimulation) to generate light phosphenes along with soundscapes could help guide the brain of late-blind users of The vOICe towards processing soundscapes in visual terms and with visual sensations, but this has not yet been tried.

Normally, activity in the primary visual cortex (V1) seems to be a necessary - but not sufficient - condition for having visual percepts while awake, but it looks like this is actually different for other mental states such as REM sleep. Interestingly, brain activity during REM sleep, which is when vivid visual dreams take place, seems suppressed in the primary visual cortex and directly adjacent areas (compared with non-REM phases of sleep), whereas intermediate visual areas in the fusiform gyrus and medial temporal lobe are highly activated, consistent with the finding that people who have lost part or all of V1 continue to dream visually (G. Rees, G. Kreiman and C. Koch,  ``Neural correlates of consciousness in humans,'' Nature Reviews Neuroscience, Vol. 3, No. 4, pp. 261-270, 2002). Fusiform gyrus (part of LOC, the lateral occipital complex) and medial temporal lobe seem also implicated in various types of audio-visual multisensory processing, and these areas could therefore potentially play a key role in realistic sensory substitution through an auditory display, in theory even for the cortically blind (people with blindsight). While exploring this further it may be useful to take an extreme position and think of normal vision as "retinally controlled hallucinations". One of the theories for explaining denial of blindness in Anton’s syndrome (anosognosia) suggests that synesthetic images are induced by tactile or auditory stimuli (cf. Sagiv and Ward, in  ``Crossmodal interactions: lessons from synesthesia,'' Progress in Brain Research, Vol. 155, Part 2, 2006, pp. 259-271). However, when a sensory substitution system makes that these tactile or auditory stimuli themselves encode live and veridical visual views as registered by a camera, what is then still being denied if a (blind) user of such a system both claims to see and properly performs tasks that would normally require natural eyesight?

Technically, the auditory display approach has now been proven feasible in the form of ``The vOICe'' real-time hardware,
Curve and ten little squares
Sound visualization
as well as in the form of a multimodal Java applet demonstrator for synesthetic sonification, but the limits of human perception in auditory profile analysis and learning abilities for comprehension of alternative sensory mappings are still largely unknown. Having been developed as an experimental system for auditory image representations, while accounting for the different spatial and temporal structure in hearing as compared to vision, The vOICe is meant to find applications as a synthetic vision device for the blind. The vOICe for Windows fully integrates camera input, video sonification and headphones output. Versions of The vOICe for mobile camera phones are also available, in the form of The vOICe for Android and The vOICe Web App. Unlike in many forms of natural synesthesia, The vOICe's synthetic pathway between the auditory representation and visual structure is not arbitrary but isomorphic. The vOICe implements a form of sensory substitution. A research challenge would be to bind visual input to visual qualia (visual sensations) with a minimum of training time and effort. Is it possible to get auditory input to "ignite" the kind of large-scale reverberant neural activity that is normally associated with visual perception and recognition? The vOICe encoding is a near-isomorphism under limited resolution and frame rate, and hence preserves most of the characteristic invariants of normal active vision and exploration, only now encoded in sound. Will this be enough to yield truly visual experiences? That would constitute the "Holy Grail" of sensory substitution. What does it take to break the qualia barrier? Is "designer synesthesia" possible?

In evaluating the potential of such a system for the blind, one should preferably involve both (young) children and adults, while distinguishing the congenitally blind from the late-blinded, because neural plasticity may drop rapidly with age, perhaps even with a critical period for the ability to exploit its full potential, and the neural development may strongly depend on whether one has had any prior visual experiences. Additionally, neural plasticity itself may be influenced by blindness, for instance with increased crossmodal plasticity through decreased GABA levels in the visual cortex of blind people (GABA, or γ-aminobutyric acid, is the brain's main inhibitory neurotransmitter). Lowered GABA levels - or blocked GABA receptors - can give cortical hyperexcitability that may also in part explain Charles Bonnet syndrome symptoms during periods of deteriorating eyesight that may give rise to visual release hallucinations. Also, and perhaps especially in the initial stages, there may arise forms of "implicit" synesthesia, with strong learned cross-modal associations between sound patterns and image patterns, but still without clear synesthetic sensations of light or color with the perceived patterns. In case of "explicit" synesthesia, the sounds would induce conscious sensations (qualia) of light and visual patterns.

The above possibilities were further investigated in a neuroscience project on brain plasticity named  "Plasticity in the human cerebral cortex: From synaesthesia to sensory substitution in the blind", in a cooperation between the Institute of Experimental Psychology ( Petra Stoerig, Inna Knoll,  Michael Proulx) at the Heinrich-Heine-University Düsseldorf, Germany, the F.C. Donders Center for Cognitive Neuroimaging (Peter Hagoort, Ph.D. student Tessa van Leeuwen) at the University of Nijmegen, The Netherlands, and the Laboratory of Physiology (Colin Blakemore) at the University of Oxford, United Kingdom, and funded by Die VolkswagenStiftung (period 2005-2008).

Colin Blakemore, in an interview with Richard Gray, science correspondent from The Sunday Telegraph, at the Sense Annual Lecture 2008, November 2008: ``I’m working with a group in Germany in a project supported by the Volkswagen Foundation to try to develop a sound-based substitute system to train blind people to use sounds to interpret visual images. What the system does is to scan visual scenes and transform those visual scenes into a pattern of sound according to a very simple set of rules: depending on the angles and the lines that make up the scene. So when the person looks at the scene they might hear a funny sound and they can learn really remarkably well what that means in terms of the image in front of them and what we’re finding is that after a lot of practice with this, people report that they actually visualize the things in front of them, they sort of see them in front of them, and are able to identify objects, recognize objects, even objects that they haven’t seen before.'' ( MP3 excerpt, 1.2 MB download)

Furthermore, he ended his presentation at the Sense Annual Lecture on Thursday November 6, 2008, saying ``I’m actually involved with the research group in Germany at the moment, trying to think about how we can use this sort of modifiability to help to restore some kind of visual experience system for blind people, delivered through other senses, particularly through hearing. So we’re working with a system for converting visual images into sounds, and to give away some of the unpublished results, we’re already showing that with training, people can learn to induce activity in the visual parts of their brains as a result of listening to sounds that are representing pictures that are being scanned. So there’s a hope that people might be able to learn to recognise images through the sound system by making use of the visual parts of their brain. Thank you.'' (transcript by Stephen McCarthy)

Most interesting is that brain research on consciousness and awareness shows that in normal healthy subjects the auditory and visual brain networks appear largely unaffected by anesthesia, but crossmodal interactions are lost under anesthesia ( ``Brain connectivity in pathological and pharmacological coma,'' Frontiers in Neuroscience, Vol. 4, Article 160, 2010). One could therefore speculate that, vice versa, strengthening crossmodal interactions through training for sensory substitution might give a boost to consciousness and awareness.

For a review of knowledge about the classic forms of spontaneous synesthesia, its history dating back some three hundred years, see Richard Cytowic's  Synesthesia: Phenomenology And Neuropsychology (PDF file). For more on sound-induced mental imagery, visit the mental imagery page. See also the Tucson 2002 conference session "Sensory Substitution I: Visual Consciousness in Blind Subjects?". General findings about crossmodal binding between pitch and elevation, as employed by The vOICe, are discussed in K. K. Evans and A. Treisman,  ``Crossmodal binding of audio-visual correspondent features,'' Journal of Vision, Vol. 5, No. 8, p. 874a, 2005, using congruent and incongruent bimodal stimuli in response time measurements, as well as in their more recent  ``Natural cross-modal mappings between visual and auditory features,'' Journal of Vision, Vol. 10, No. 1, pp. 1-12, 2010. Related is also the publication by F. Maeda, R. Kanai and S. Shimojo, ``Changing pitch induced visual motion illusion,'' Current Biology, Vol. 14, No. 23, pp. R990-R991, 2004. This indicates that the use of pitch for height (vertical position) by The vOICe is not "arbitrary", but well-rooted in human physiology. Similarly, the temporal ventriloquism effects discussed in J. J. Stekelenburg and J. Vroomen,  ``An event-related potential investigation of the time-course of temporal ventriloquism,'' Neuroreport, Vol. 16, No. 6 pp. 641-644, April 25, 2005, lends further physiological support to The vOICe's time-domain multiplexing with lateral position (horizontal position) mapped to time and stereo position.

Illusory "jumping rabbit" (Meijer 2006)

Illusory "rabbit" and elevation jump with The vOICe mapping The vOICe mapping may not only be used to demonstrate the illusory flashes and "rabbit" effects of Shams, Kamitani and Shimojo, but also be applied to probe possible perception of illusory elevation jumps through sudden pitch changes, closer to the work of Evans and Treisman. In the example on the right, which links to a short WMV format  video clip, The vOICe was used to generate a soundscape with a number of bright dots at the same elevation and one more elevated dot at the center. In the scanning image sequence, the elevated dot was removed, such that it is still in the sound but it does not physically show on the screen. However, if you click the image to launch the video clip and set it to auto-repeat, chances are that you will (occasionally, weakly) perceive the illusion of a middle dot visually jumping up at the moment of the pitch jump. This is often best perceived through peripheral vision: do not look directly at the video clip while listening, but rather look at a fixation point to the left or right of the video clip while it is showing on the screen. You may also need to play with loudness, full screen mode or other settings for any or best effects. Any illusory visual jumps would further confirm the congruence between pitch and height as used in The vOICe's general image to sound mapping. Can you perceive it? Can it be trained?

An open question for future research is whether brainwave entrainment can be usefully exploited in sensory substitution, for instance when targeting strong multisensory or cross-modal binding.

Literature on The vOICe approach:
Meijer, P.B.L., ``An Experimental System for Auditory Image Representations,'' IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992. Reprinted in the 1993 IMIA Yearbook of Medical Informatics, pp. 291-300. Abstract and electronic version of full paper available on-line. Perceptual and philosophical issues concerning the quality of sensations, including those attainable with The vOICe, are discussed in the book  Seeing Red - A Study in Consciousness (2006) by Nicholas Humphrey. Excerpt: ``Meijer himself poses the question: "Is it Vision? Can it be?" It is clear that he thinks it surely can be. He says: "Our assumption here is that the brain is ultimately not interested in the information 'carrier' (here, sound) but only in the information 'content'." In other words, he sees no theoretical reason why auditory vision could not be the qualitative equivalent of normal vision. Even so, he stops short of claiming that the evidence of his studies confirm this.'' Indeed this evidence has to come from (third-person statistics of) first-person accounts of subjective experiences, because neural correlates with (visual) function will not suffice even if these would for instance show visual cortex activation in visual tasks performed with the soundscapes (visual sounds, audio maps) from The vOICe.

Synesthesia

Simple line and dot images ``For a few rare individuals, synesthesia is a strong correlation between perceptual dimensions or features in one sensory modality with perceptual dimensions or features in another (Harrison and Baron-Cohen 1997; Martino and Marks 2001). For example, such an individual may imagine certain colors when hearing certain pitches, may see different letters as different colors, or may associate tactile textures with voices. Strong synesthesia in a few rare individuals cannot be the basis for sensory substitution; however, much milder forms in the larger population, indicating reliable associations between intermodal dimensions that may be the basis for cross-modal transfer (Martino and Marks 2000), might be exploited to produce more compatible mappings between the impaired and substiting modalities. For example, Meijer (1992) has developed a device that uses hearing to substitute for vision. Because the natural correspondence between pitch and elevation is space (e.g., high-pitched tones are associated with higher elevation), the device uses the pitch of a pure tone to represent the vertical dimension of a graph or picture. The horizontal dimension of a graph or picture is represented by time. Thus, a graph portraying a 45º diagonal straight line is experienced as a tone of increasing pitch as a function of time. Apparently, this device is successful for conveying simple 2-D patterns and graphs. However, it would seem that images of complex natural scenes would result in a cacophony of sound that would be difficult to interpret.''

Source: "Sensory Replacement and Sensory Substitution: Overview and Prospects for the Future", by Jack Loomis, in  "Converging Technologies for Improving Human Performance: Nanotechnology, Biotechnology, Information Technology and Cognitive Science," (US NBIC 2002), NSF-DOC Converging Technologies Report, p. 220, June 2002.

Outstanding question

``Can synesthesia be learned via explicit training, or lost via conditioning?''

Source: "Mechanisms of synesthesia: cognitive and physiological constraints," by Peter Grossenbacher and Christopher Lovelace, Trends in Cognitive Sciences, Vol. 5, No.1, 2001, pp. 36-41. Available  online (PDF file).

Related

UCSD press release, April 11, 2007:  Wired for sound: How the brain senses visual illusions.

Literature on acquired synesthesia, implicit synesthesia, visual awareness and blindness:

Recent general publications about synesthesia, connectome and visual hallucinations.

Related research is also performed at:
 http://www.syn.sussex.ac.uk (Jamie Ward, Synaesthesia Research Group, University of Sussex in Brighton, UK)
 http://neuro.caltech.edu (Shinsuke Shimojo, Psychophysics Laboratory, CalTech, USA)
 http://shamslab.psych.ucla.edu (Ladan Shams, Visual and Multisensory Perception Lab, UCLA, USA)
 http://www.cinacs.org (CINACS project, Cross-Modal Interaction in Natural and Artificial Cognitive Systems, Germany & China)

Related online demonstrations include:
 Sound-induced illusory flashing (Ladan Shams, Yukiyasu Kamitani and Shinsuke Shimojo)
 Sound-induced visual "rabbit" (Yukiyasu Kamitani and Shinsuke Shimojo)

General synesthesia websites:
 http://www.synesthesie.nl (The Netherlands, Crétien van Campen)
 http://www.uksynaesthesia.com (UK Synaesthesia Association)
 http://www.synesthesia.info (ASA, American Synesthesia Association)
 Sean Day's synesthesia website

Synesthete and photographer  Marcia Smilack presented her findings with The vOICe at the Sixth Annual National Conference of the  American Synesthesia AssociationThe Language of Synesthesia.
 
I am a bi-directional synesthete who experiences multiple forms of synesthesia. I photograph reflections on moving water and click the shutter at the moment I experience a texture or sound response. For that reason, a researcher asked for my input on his invention to convert images of one's surroundings into sound to help blind people see.
 
I sent him photographs of sounds which I selected from the thousands of photos I have taken in the last twenty years that document my synesthetic responses. However, he explained that as no two synesthetes perceive the same way, my images were most likely not universal enough to be useful for his purposes. But that planted an idea in me. Were there any universals? I sorted my images into two piles, putting the sound pile in front of me, shoving the texture pile to the side because I didn't expect to use it. Suddenly, I heard the sound of chimes, only it was coming from the wrong place: peripherally seen, the top image of the texture pile was eliciting the chime sound. When this happened a second time, I found my eureka moment.
 
To test my hunch of universality, I began an experiment in which I asked non-synesthetes to match sounds to the images I presented. No one had difficulty matching the image to the chime sound. So, now I am considering: first, could my two senses have traded places in the layers of conscious awareness; and second, am I more likely to find universal shapes for sound amid my texture images where they may be hidden even from me; and third, do these images elicit similar responses in both synesthetes and non-synesthetes alike. I wish to share the results of similar experiments at the conference in the hope of learning more about the language of synesthesia.
, January 26-28, 2007, at the university of South Florida, St. Petersburg, USA. Abstract title "The Language of Synesthesia".

Peter Meijer and Marcia Smilack, November 29, 2006; courtesy Crétien van Campen Associated Press, January 2007 ( Boston Globe)

Smilack, who is working with scientists to create visual aides for the blind, and has made herself the subject of much questioning, said she thinks it's all related.

"The physiological process," she said, "complements the art."

Copyright © 1996 - 2024 Peter B.L. Meijer