Circuit Topology of The vOICe

This page defines the circuit topology of The vOICe hardware prototype (1989).

Frame grabbing
The television camera is used on its side (tilted 90 degrees), such that image scanning actually takes place from bottom to top for each scanline, and from left to right for succesive scanlines. This is just to have the same pixel order for frame grabbing as for image-to-sound conversion. We then need just a single address generator (the counter CNT1-3) and save a lot of components. Thus we reduce system size and cost. The monitor is also used on its side to obtain an upright image for the experimenter. Note that the monitor is just a temporary tool for testing the prototype (to know what the camera sees). In the following discussion we describe the frame grabbing system as if it were not tilted! This allows us to apply the conventional names for television synchronization signals and (hopefully) avoid confusion. One should keep in mind that with this convention a horizontal scanline of the television frame will represent a vertical line in the user image. In this chapter we will use the term "user" when indicating the image the user perceives.
A television frame is scanned by the camera 50 times per second. The scanning takes place from left to right (64us per horizontal line, i.e. 15625Hz), and from top to bottom (312/313 lines). The 625-line standard black&white television signal applies interleaving, alternatingly scanning 312 vertical lines in one 20ms frame and 313 lines in the other. Because we need only a vertical resolution (horizontal to the user) of 64 pixels, we grab one arbitrary 20ms frame to minimize blurred images. The system frequency is independent from the camera frequency, so synchronization is needed for frame grabbing. The frame grabbing process is enabled when bit 21 of the counter CNT1-3 becomes low, which happens when the previous image-to-sound conversion is completed. The system then halts (clock disabled) until a vertical synchronization pulse in the television signal occurs (ADVSNC). Subsequently a monoflop MON adds another 1.52ms delay. This ensures that we skip the top margin of the frame (the left margin to the user, which is also invisible on the monitor). Another monoflop MON is then enabled, and triggered by a horizontal synchronization pulse in the television signal (ADHSNC). This adds another 15.2us delay to skip the left margin of the frame (the bottom margin to the user, also invisible on the monitor). Then the system clock is enabled, allowing the counter to count freely while the first horizontal line (leftmost vertical user line) is scanned. During one clock phase the video signal is sampled in a sample-and-hold stage and converted into Gray-code by a set of comparators. During the other clock phase, when the 4-bit code has stabilized, the resulting digital grey-tone is latched and stored in video memory, while the sample-and- hold stage is opened for capturing the next video sample (pixel). The clock is disabled after 64 clock pulses, i.e. 32us, which covers most of the area visible on the monitor. By then we have stored the first 64 pixels. After a horizontal synchronization pulse and a 15.2us delay from the monoflop the clock is enabled again for the second line, etc. Because of the counter configuration CNT1-3, with CNT2 bypassed, only every fourth line of the television signal causes an increment in the most significant 6 bits addressing the video memory PIX1-4. So only the last of every four lines is actually remembered for later use. The others are overwritten. This is meant to make uniform use of the 312/313 line frame. Effectively grabbing only every fourth line ensures that our 64 vertical pixel resolution covers 64*4=256 lines of the frame, which is almost the whole visible field on the monitor. After 256 scanlines, bit 21 of the counter becomes high, disabling the frame grabbing and restarting the image-to-sound conversion.

Digital processing
The 2MHz system clock INVCK,DIVCK is driven by an 8MHz cristal. Starting at the upper right corner we find three dual 4-bit binary counters CNT1-3 driven by the system clock. These generate addresses for phase and phase change memories and also for the video memory. The six lsb's of the top counter always indicate a particular oscillator (phase and phase change memory) and its corresponding pixel height (video memory). During image-to-sound conversion, the counters are normally configured as one large 23 bit ripple-carry counter. During frame grabbing, the multiplexer MUX bypasses the 7-bit middle counter to give a 128-fold increase in the frequencies of the most significant counter bits (the bottom counter). This is needed to grab a video frame within the 20ms (50Hz) television single frame time and thus avoid blurred images. The six msb's of the bottom counter always indicate a particular vertical scanline (horizontal position). The middle counter just ensures that it takes some time (and sound samples) before the next vertical scanline is going to be converted into sound. The 2MHz input to the top counter causes the output of the top counter to change every 500ns. During image-to-sound conversion the output of the bottom counter changes every 500ns*2^8*2^7=16.4ms, so the conversion of the whole image, i.e. 64 vertical lines, takes 64*16.4ms=1.05s. For experimental purposes it can be most easily changed into a 2.1 second conversion time by using the full 8 bits of the middle counter, which is the purpose of the switch SW. In that case the counters are configured as one large 24 bit counter. However, in the discussion we will assume a 7 bit middle counter (1.05s conversion time) giving a 23 bit total counter (bit 0 through 22), unless stated otherwise. The addresses generated by the counters go unlatched to the phase change EPROMs DFI1,2 while they are latched by L1CNT,L2CNT before going to the phase SRAMs FI1,2. This is meant to take care of the fact that the EPROMs are much slower (250ns) than the SRAMs (150ns). Therefore the EPROMs receive their addresses 250ns earlier. The phase change of a particular oscillator read from the EPROMs is added to the present phase read from the SRAMs. The summation takes place in the 4-bit full adders AD1-4, and the result is latched by octal latches L1FI,L2FI before being rewritten into the SRAMs. The new phase is also sent down the latches L3FI,L4FI, together with 4 bits pixel brightness information coming from the video SRAMs PIX1-4. After the possible negation (ones complement) by the exclusive- ORs XOR1-3, the phase and brightness are used as the address for the sine EPROMs SIN1,2. These give a sine value belonging to the phase range 0..PI/2 (1st quadrant), and scaled by the brightness value. The whole 2*PI (4 quadrant) phase range is covered by complementing the phase using the exclusive-ORs and by bypassing the sine EPROMs with an extra sign bit (the line passing through the D- flipflop DFF2; this flipflop gives a delay ensuring that the sign bit keeps pace with the rest of the sine bits). The sign bit determines whether the ALUs ALU1-5 add or subtract. The ALUs combine the results of all 64 emulated oscillators in one superposition sample. The latches L1SIN,L2SIN,L3SIN are just for synchronization of the adding process. When the superposition has been obtained after 64 system clock cycles, the result is sent through the latches L1DAC,L2DAC to the 16 bit digital-to-analog converter DAC. The inverter at the bottom of the figure serves to give an offset to the summation process by the ALUs after clearing the latches. The DAC input range is 0000H till FFFFH, so the starting value for the addition and subtraction process should be halfway at 8000H to stay within this range after adding and subtracting 64 scaled sine samples. The present design keeps the superposition almost always within this range without modulo effects (which would occur beyond 0000H and FFFFH), even for bright images. This is of importance, because overflows cause a distracting clicking or cracking noise. The average amplitude of the superposition will grow roughly with the square root of the number of independent oscillators times the average amplitudes of these oscillators. This can be seen from statistical considerations when applying the central limit theorem to the oscillator signals and treating them as stochastic variables, and simplifying to the worst case situation that all oscillators are at their amplitude value (+ or -). Therefore the average amplitude of such a 64 oscillator superposition will be about 8 times the amplitude of an individual oscillator, also assuming equal amplitudes of all oscillators (as in a maximum brightness image). This factor 8 gives a 3 bit shift, which means that we must have provisions for handling at least 3 more bits. This is the purpose of ALU5, which provides 4 extra bits together with part of L3SIN (*). The output of the DAC is sent through an analog output stage, indicated only symbolically by the opamp. Finally the result reaches the headphones.

(*) Numerical calculations on sine superpositions showed that for a very bright image field, 3 bits would cause overflow 16% of the time, whereas 4 bits would cause overflow during 0.5% of the time. Experimentally, this appeared to be still too much. Overflows for large and very bright image parts were heard as a disturbing "cracking" sound. Division of all sine values in the EPROMS by 4 cured the overflow problem with no noticable loss of sound quality (a 16 bit sine value would have been rather redundant anyway).
We have now more or less followed a particular oscillator sample through the system. When the image-to-sound conversion is complete, as indicated by bit 21 of the counter CNT3, the frame grabbing process starts.
The analog video signal from a camera is sent through a sample- and-hold circuit SAM and converted to a 4-bit digital signal in Gray code by ADC1-15. This serves to reduce the probability of getting very inaccurate results due to transition states (spikes and glitches). The 4-bit code is then stored in video SRAM in PIX1-4, which receives its addresses from the counter CNT1-3. 2 bits of this address are used for chip selection by the demultiplexer DMX.
Finally, there is the control logic (or "random" logic) that takes care of the detailed timing, synchronization and mode switching (frame grabbing versus image-to-sound conversion). The meaning of the symbols in the control logic is the following. The tau's represent triggerable delays (monoflops MON). The small sigma's represent the horizontal and vertical synchronization pulses of the television signal, the outputs of the comparators ADHSNC and ADVSNC. The delta's represent differentiating subcircuits that generate a three- gate-delay spike on the trailing edge of an input pulse. This spike is long enough to trigger subsequent circuitry.

Circuit Topology of The vOICe

Copyright © 1996 - 2024 Peter B.L. Meijer