Related Seeing-with-Sound Projects

« The vOICe Home Page
« The vOICe Web App
« The vOICe for Android
« The vOICe for Windows

Other projects based on or related to The vOICe approach:

In 2016, Derp Magurp ("magurp244") released BrushTone, an accessible Paint Tool for the visually impaired that allows users to view, modify, and create images purely using sound. It includes a window scan function based on The vOICe.
In 2015, Mike Mcwilliams ("aftersight") and Mikael Holmgren ("mrindoj") created the After-Sight-Model-1 GitHub repository, which includes raspivoice, a version of The vOICe for the Raspberry Pi, and teradeep, a deep learning neural network for visual object recognition.
In 2015, Derp Magurp ("magurp244") ported The vOICe seeing-with-sound sample code to Python (I2S*.zip), further using Pyglet, PyAL, OpenAL and Numpy.
In 2015, "Ar-es" (or Ares) created raspivoice, a version of The vOICe for the Raspberry Pi. More about this in the Raspberry Pi forum thread Sight for the Blind for <100$. Mike Mcwilliams developed a Raspberry Pi device for The vOICe.
In 2014, Quickode Ltd. released for Amir Amedi lab the iOS program "EyeMusic: Hearing colored shapes" for iPhone, iPod Touch and iPad.
In 2013, "berak" created a version of The vOICe for Linux (formerly at github.com/berak/seeingwithsound) based on OpenCV and RtAudio.
In 2013, Gao Yaoyao (name later changed to Zhi Zheng) released the iOS program "Voice vision" for iPhone, iPod Touch and iPad.
In 2012, Technion Ph.D. student Uri Dubin in Israel created the Matlab program SoundsOfImages, described as "Can you hear the Image? Tool that allows you to transform image into audible sounds".
In 2011, Ph.D. student Nicolas Louveton in France created Wavy, a flexible visual-to-auditory sensory substitution system written in Python.
In 2010, blind programmer Michael Curran created audioScreen, an experimental program for blind users of Windows 7 touch screens, and based on The vOICe mapping.
In 2010, Tom Wright (Thomas David Wright), student at the university of Sussex, UK, created a Pure Data (Pd patch) implementation of image to sound conversion, voice.pd. Also in 2010, he created a "Customisable image sonifier" written in C#, named SSD1. Yet another project is his Polyglot Framework for Sensory Substitution Devices.
In 2009, Katarzyna Zarnowiec, student Automation and Robotics at the AGH University of Science and Technology in Krakow, Poland, and Frederico Contente, student at Tampere University of Technology, Finland, implemented a Matlab program "Sound Steganography", using a mapping related to that used by The vOICe.
In 2008, Stefan Strahl of UCL Ear Institute created a Google Android implementation of image to sound conversion named "Seer", available under the Google code open source project sensub (sensub.googlecode.com). Submitted as part of the Android Developer Challenge.
In 2008, Evan Salazar from NMSU created a Perl implementation of image to sound conversion, named imageEncode: Encoding an image to sound.
In 2007, Nelson Castillo from Bogota, Colombia, created a Tetris-like audio game based on free source code of The vOICe, and programmed using Python, the Pygame API and SWIG.
In 2006, Chris Merck (navaburo) started the open source project OpenSonify for combining webcam, PC and headphones.
In 2006, Luke Barrington, Ph.D. student in Electrical and Computer Engineering at UCSD, implemented a Matlab version of The vOICe, "vOICe.m".
In 2006, Hans Petter Selasky developed an image-to-sound mapping similar to The vOICe but based on noise filtering (subtractive synthesis instead of additive synthesis).
In 2005, Matt Zukowski of the University of Toronto developed for his undergraduate thesis project "Vaudiolizer" sensory replacement system, essentially a Java version of The vOICe.
In 2005, Clayton Shepard, Richard Hall and Jared Flatow of Rice University developed Seeing Using Sound, aiming to simplify images in such a way that only the most important information is conveyed in the sounds (Elec 301 Project - Fall 2005: Seeing using Sound transforms images into sound to aid blind people). Scanning is done from the outside-in in time, and different tone scales are used to represent color.
In 2005, Frederic Paquay from Belgium reported on the possibilities of using camera-based "depth from defocus" (DFD) for distance estimation, mapping results to sound. His November 2005 document titled "Technology to help blind people: Image restitution by sonorous signals" (PDF file) is provided here with his permission.
In 2005, Malika Auvray of COSTECH at the University of Technology of Compiègne and Kevin O'Regan of the Laboratory for Experimental Psychology of the University of Paris 5 disclosed some of their work on the "Vibe" as a system for instantaneous visual feedback, developed by Sylvain Hanneton of the University of Paris 5. It maps brightness to loudness, height to pitch, and lateral position to stereo sound, somewhat like The vOICe running in its optional "All at once" mode (menu Options | Video Rate). More information is available on the SourceForge thevibe project page, and the web pages of Barthélémy Durette on "Sensory Substitution System for Visual Handicap". See also the ECCV 2008 workshop paper by B. Durette, N. Louveton, D. Alleysson and J. Hérault, "Visuo-auditory sensory substitution for mobility assistance: testing TheVIBE".
In 2005, Igor Bakarčić, Aleksandra Čereković, Egon Geci, Branka Lakić and Ivan Sobota at the Faculty of Electrical Engineering and Computing at the University of Zagreb, Croatia, implemented a version of The vOICe in Matlab as documented in "Mapiranje slike sa zvukom - Seeing with your ears" (no longer online).
In 2004, George Loo of LKS Labs in Singapore developed a low-cost hardware implementation for The vOICe, named E-Eye (Ear-Eye).
In 2004, Sok Hong Teow implemented a version of The vOICe in Matlab for his B.Sc. thesis at the department of Electronics and Communications Engineering of Curtin University of Technology, Australia, and titled "Soundscape: Acoustic Imaging of Sight".
In 2004, a French company named Primatop (Fabrice Pajak) started marketing their MIDI-based VisioPlayer System for audio-visual sensory substitution for the blind. Some of the hardware options for camera input and visual display seemed based on an Age Tech RF-ZLCD30E wireless camera with LCD receiver ("VisioMonitor" option) and a ZTV 830G Mini Wireless PC Camera Kit, combined with Philips SBC-HC8441 FM wireless stereo headphones.
In 2004, Wolfgang Fink, then at NASA JPL and Keck School of Medicine at USC, was working on a "Digital Object Recognition Audio-assistant" (DORA) in cooperation with Mark Tarbell, James Weiland and Mark Humayun. The system is described as "a camera-input/audio-output system that recognizes color, brightness, and a number of everyday objects to be verbally announced to the visually impaired or blind patient on demand". It is not known how this differs from what can be done via The vOICe's talking color identifier and Mobile OCR for the Blind interface. A poster was presented at ARVO 2004 (W. Fink, M. Tarbell, J. Weiland and M. Humayun, "DORA: Digital Object Recognition Audio-Assistant For The Visually Impaired").
In 2003, Nikolaos Bourbakis of Wright State University (ITRI) and Sethuraman Panchanathan of Arizona State University (CUbiC, ASU) started a project on a camera-based intelligent assistant for the blind called "Tyflos". The focus of the Tyflos/iCare project seems to be more on attempts to recognize and give verbal descriptions of things, similar to what can be done via The vOICe's Mobile OCR for the Blind interface, while the main focus of The vOICe is on providing the "raw" visual information via soundscapes.
In October 2003, Blue Edge Bulgaria (BEB, www.blue-edge.bg) released a mobile camera phone implementation of The vOICe, The vOICe BEB, running on the Nokia 3650.
In June 1997, a Ph.D. project was started by Phil Picton of the School of Technology & Design, University College Northampton, UK: Michael Capp was to investigate auditory mappings based on the work of Adrian O'Hea, who unfortunately died in 1994 shortly after obtaining his Ph.D. in Electronics
A. R. O'Hea, ``Optophone Design: Optical-to-Auditory Vision Substitution for the Blind,'' Ph.D. thesis, The Open University, UK, 1994.
Adrian O'Hea appears to have independently discovered auditory mappings that are similar to The vOICe mapping (which he calls the Cartesian Piano Transform). He proposed a number of variations, like the use of a polar coordinate system to obtain a kind of artificial fovea. The focus of Michael Capp's Ph.D. work later shifted towards the independent development of stereo vision options for depth mapping, in combination with "cartoon mapping" for the preservation of visual texture. In 2000, Michael Capp received his Ph.D. on ``Alternative approaches to optophonic mappings'' from Leicester University, UK, with Peter Meijer acting as invited External Examiner.
Since about 1995, José Luis González Mora and Luis Fernando Rodríguez Ramos and colleagues have been working on an auditory display for the blind in the Virtual Acoustic Space (Espacio Acústico Virtual, or EAV) project of the Institute of Astrophysics of the Canary Islands (IAC) and the University of La Laguna, Tenerife, Spain.
Since Spring 1999, Julian Rohrhuber and Oliver Wittchow of the University of Hamburg, Germany, are working on a Nintendo GameBoy version of The vOICe, which they call the "nanovoice". Very interesting and original work!
On September 16, 1998, the BBC science program Tomorrow's World featured a musical image to sound mapping devised by John Cronly-Dillon, a neurobiologist at the Department of Optometry and Vision Sciences at the University of Manchester (UMIST), UK. The broadcast showed examples in which the basic characteristics of transforming shapes into music for hearing images appeared identical to those employed by The vOICe (and even to those employed by the pianola and optophone): left-to-right scanning, with pitch depending on elevation, and with a vertical line segment resulting in a chord with all tones sounding simultaneously, as with the four vertical edges in the MIDI sample for the box shown on the left. Cronly-Dillon's implementation was a computer program without live visual input, but he mentioned plans for a future portable system with a camera. According to an article in The Guardian UK of September 22, 2000, it will be called "SmartSight" and have the looks of a Star Trek visor. ``Backing up claims that this is an innovation from the realms of science fiction, the new gadget is almost identical to that worn by blind character Geordi la Forge on Star Trek's "The Next Generation".'' [...] ``The scientists say an early prototype will be ready for testing before the end of this year. A battery pack would be worn at the waist, while the "eye" of the device would either he a hand-held video camera, or two cameras fitted into a visor worn on the head.'' See also the archived website of the associated SmartSight Limited startup venture with John Cronly-Dillon, Krishna Persaud and David Stead. Smartsight Limited was registered at Companies House on July 15, 1999, under company number 03810177.

"Music" score with MIDI sample for circle, triangle and square
A demonstration was given at the RNIB Techshare 2001 Conference in Birmingham, UK.
Starting around 1998, Patricia Arno, Christian Capelle, Charles Trullemans, Anne De Volder, Claude Veraart and other researchers at the Catholic University of Leuven in Belgium presented an experimental auditory display for the blind named the PSVA (Prosthesis for Substitution of Vision by Audition, or in French, Prothese de Substitution de la Vision par l'Audition).
In 1998, Paul Querelle, then a student at Anglia Polytechnic University in Cambridge England, started working with Camsight, involving a local support group and a number of blind or partially sighted local volunteers, to further research in this field. According to his information, area's of research include: HCI aspects of vision to sound, extracting shape information from images, extracting depth information from multiple images, locating a mouse pointer using sound to facilitate menu navigation, use of the inverse Fourier transform with windowing to construct complex soundscapes, determination of texture in an image to facilitate OCR and the use of reverberation and other filters to enhance auditory aesthetics.
In June 1998, information became available about the work of Harry Reid. He had invented a portable system which he calls the Sonic Eye, A document describing his system is available for download as a zipped MS Word file harreed.zip, provided here with his permission. It includes several illustrations clarifying the image to sound mapping concepts that turn out to be basically the same for The vOICe and the Sonic Eye.

The Sonic Eye for walking and reading

``A voice print represents sounds as images. High pitch sounds appear as marks near the top of the image; low pitch sounds appear as marks near the bottom of the image. Time is a vertical line moving at a constant speed from left to right. Thus a voice print can convert any sound into an image. The sonic eye does the reverse to convert images into sound. The sonic eye "sees" a thin vertical window wherever it is pointed and makes a sound combining high pitch for objects near the top of the window and low pitch for objects near the bottom as the thin window is swept across objects of interest. The user listens to these sounds and learns about the visual environment.'' [Harry Reid, 1998]
Since Autumn 1997, ThalesScope Limited has started beta testing a device called the ThalesScope, which is supposed to convert sight into sound. Other than this, little factual data is available about it at the time of this writing, and it is also unclear where it would differ from existing The vOICe technology. ThalesScope Limited is headed by Isaiah W. Cox, and owned in part by Thales Resources, and in part by Tom Karnes. Borealis Technical Limited does the build, development, and test work.
The latest information now suggests that this project might have been discontinued or at least is suffering from serious delays.
Other related early image-to-sound devices are those that targetted reading (like the original optophone), such as the Visotoner (using 9 tones) and the Stereotoner (using 10 tones).

If you know of more projects related to The vOICe, please report. For this page, "related" means using either the same general image-to-sound mapping as employed by The vOICe or alternative image-to-sound mappings to be used in information-rich auditory displays for artificial vision. Related projects in the sense of targetting visual prostheses for the blind via other combinations of modalities (e.g., sensory substitution via sonar or radar input, tactile output) are described on the sensory substitution page.

Note: The vOICe approach was originally published in the IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992: P.B.L. Meijer, ``An Experimental System for Auditory Image Representations.'' This paper was next selected for reprint in the 1993 IMIA Yearbook of Medical Informatics, pp. 291-300. Awarded U.S. Patent 5097326, on an "image-audio transformation system", filed July 27, 1990: ``In a device for converting visual images into representative sound information especially for visually handicapped persons an image processing unit is provided with a pipelined architecture with a high level of parallelisum. An image is scanned in sequential vertical scanlines and the acoustical representatives of the scanlines are produced in real time. Each scanline acoustical representation is formed by sinusoidal contributions from each pixel in the scanline, the frequency of the contribution being determined by the position of the pixel in the scanline and the amplitude of the contribution being determined by the brightness of the pixel.''

Related Seeing-with-Sound Projects

Copyright © 1996 - 2024 Peter B.L. Meijer