Academia.eduAcademia.edu
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 1, FEBRUARY 1999 67 A VLSI Sorting Image Sensor: Global Massively Parallel Intensity-to-Time Processing for Low-Latency Adaptive Vision Vladimir Brajovic, Member, IEEE, and Takeo Kanade, Fellow, IEEE Abstract—This paper presents a new intensity-to-time processing paradigm suitable for very large scale integration (VLSI) computational sensor implementation of global operations over sensed images. Global image quantities usually describe images with fewer data. When computed at the point of sensing, global quantities result in a low-latency performance due to the reduced data transfer requirements between an image sensor and a processor. The global quantities also help global top-down adaptation: the quantities are continuously computed on-chip, and are readily available to sensing for adaptation. As an example, we have developed a sorting image computational sensor—a VLSI chip which senses an image and sorts all pixel by their intensities. The first sorting sensor prototype is a 21 2 26 array of cells. It receives an image optically, senses it, and computes the image’s cumulative histogram—a global quantity which can be quickly routed off chip via one pin. In addition, the global cumulative histogram is used internally on-chip in a top-down fashion to adapt the values in individual pixel so as to reflect the index of the incoming light, thus computing an “image of indices.” The image of indices never saturates and has a uniform histogram. Index Terms— Computational sensors, image sensors, robot perception, smart sensors, vision, VLSI sensors. I. INTRODUCTION ANY time-critical robotics applications, such as autonomous vehicles and human-machine interfaces, need a low-latency and adaptable vision system. Conventional vision systems comprised of a camera and processor provide neither low-latency performance nor sufficient adaptation. Latency is the time that a system takes to react to an event. In the conventional systems, the latency is incurred in both the data transfer bottleneck created by the separation of the camera and the processor and in the computational load bottleneck created by the necessity to process a huge amount of image data. For example, a standard video camera takes 1/30 of a second to transfer an image. In many critical applications, the image capture alone presents excessive latency for the stable control of a robotics system. Another example is the pipeline dedicated vision hardware which delivers the processing power to update its output 30 times per second; however, the latency M Manuscript received July 23, 1997; revised June 3, 1998. This work was supported in part by ONR Grant N00014-95-1-0591 and NSF Grant MIP9305494. This paper was recommended for publication by Associate Editor S. Hutchinson and Editor A. Goldenberg upon evaluation of the reviewers comments. The authors are with the Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: brajovic@cs.cmu.edu). Publisher Item Identifier S 1042-296X(99)01326-9. through the pipeline is typically several frame times, again rendering the conventional system unsuitable for many timecritical applications. The second important feature of a vision system is adaptation. It has been repeatedly observed in machine vision research that using the most appropriate sensing modality or setup, allows algorithms to be far more simple and reliable. For example, the concept of active vision proposes to control the geometric parameters of the camera (e.g., pan, tilt, zoom, etc.) to improve the reliability of the perception. It has been shown that initially ill-posed problems can be solved after the topdown adaptation of the camera’s pose has acquired new, more appropriate image data [1]. Adjusting geometric parameters is only one level where adaptation can take place. A system which can adjust its operations at all levels, even down to the point of sensing, would be far more adaptive than the one that tries to cope with the variations at the “algorithmic” or “motoric” level alone. The computational sensor paradigm [3], [11] has potential to greatly reduce latency and provide adaptation. By integrating sensing and processing on a very large scale integration (VLSI) chip, both transfer and computational bottlenecks can be alleviated: on-chip routing provides high throughput transfer, while an on-chip processor could implement massively parallel computational models. Adaptation is also more conveniently facilitated: the results of processing are readily available to sensing for adaptation. So far, a great majority of computational sensory solutions implement local operations on a single light sensitive VLSI chip (for examples, see [11], [13], and [19]). Local operations use operands within a small spatial/temporal neighborhood of data and, thus, lend themselves to graceful implementation in VLSI. Typical examples include filtering and motion computation. Local operations produce preprocessed “images;” therefore, a large quantity of data still must be read out and further inspected before a decision for an appropriate action is made—usually a time-consuming process. Locally computed quantities could be used for adaptation within the local neighborhood, but not globally. Consequently, a great majority of computational sensors built thus far are limited in their ability to quickly respond to changes in the environment and to globally adapt to a new situation. Global operations, on the other hand, produce fewer quantities for the description of the environment. An image histogram is an example of a global image descriptor. If computed 1042–296X/99$10.00  1999 IEEE 68 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 1, FEBRUARY 1999 at the point of sensing, global quantities can be routed off a computational sensor through a few output pins without causing a transfer bottleneck. In many applications, this information will often be sufficient for rapid decision making and the actual image does not need to be read out. The computed global quantities also can be used in top-down fashion to update local and global properties of the system for adapting to new conditions in the environment. Implementing global operations in hardware, however, is not trivial. The main difficulty comes from the necessity to bring together, or aggregate, all or most of the data in the input data set [3], [5]. This global exchange of data among a large number of processors quickly saturates communication connections and adversely affects computing efficiency in parallel systems—parallel digital computers and computational sensors alike. It is not surprising that there are only a few computational sensors for global operations: [8] and [18] show computational sensors for detecting isolated bright objects against the dark background and computing the object’s position and orientation, and [6] and [16] show attention-based computational sensors that globally select the “most salient features” across the array of photoreceptors. This work introduces a novel intensity-to-time processing paradigm—an efficient solution for VLSI implementation of low-latency massively parallel global computation over large groups of fine-grained data. By using this paradigm, we have developed a sorting computational sensor—an analog VLSI chip which sorts pixel of a sensed image by their intensities. The sorting sensor produces images of indices that never saturate. As a by-product, the chip provides a cumulative histogram—a global descriptor of the scene—on one of the pins; this histogram can be used for low-latency decision making before the image is read out. II. INTENSITY-TO-TIME PROCESSING PARADIGM The intensity-to-time processing paradigm implements global operations by aggregating only a few of the input data at a time. Inspired by biological vision [20], the paradigm is based on the notion that stronger input signals elicit responses before weaker ones. Assuming that the inputs have different intensities, the intensity-to-time paradigm separates responses in time allowing a global processor to makes decisions based only on a few inputs at a time. The more time allowed, the more responses are received; thus, the global processor incrementally builds a global decision first based on several, and eventually based on all, the inputs. The key is that some preliminary decision about the environment can be made as soon as the first responses are received. Therefore, this paradigm has an important place in low-latency vision processing. The architecture supporting intensity-to-time processing is 0, a cell shown in Fig. 1. After a common reset signal at generates an event at the instant (1) is a monotonic function, and the radiation where received by the cell . Therefore, any two cells receiving Fig. 1. A computational sensor architecture for the intensity-to-time processing paradigm. radiation of different magnitude generate events at different is decaying, then the ordering of events is times. If consistent with a biological system: stronger stimuli elicit responses before weaker ones. A global processor receives and processes events. In addition, there can be a local processor attached to each cell. The generated events then control the instant when a local processor in each cell performs at least one predetermined (i.e., prewired or preprogrammed) operation. By separating the input data in time, the intensity-to-time processing paradigm eases the global data aggregation and computation: 1) global processor processes only a few events at a time; 2) communication resources are shared by many cells; 3) global processor and local processors infer the input operand intensity by measuring the time an event is ]. received [e.g., Traditionally, the intensity-to-time relationship has been used in single and double slope A/D converters [10]. In vision, it has been used to improve diffusion-based image segmentation [7]—a local operation, and for image acquisition in a SIMD architecture [9]—an architecture well suited only for local operations. In contrast, our architecture allows global operations and shares some features of traditional MIMD parallel processing. Namely, the local processors perform their operations asynchronously, an essential feature for the quick response and the low latency performance of parallel systems [3]. The intensity-to-time is closely related to the event-address neural communication schemes proposed by a number of researchers [12], [15], [17]. In these schemes the plurality of artificial neurons fire pulses (i.e., events) at rates reflecting their individual levels of activity. The goal is to BRAJOVIC AND KANADE: VLSI SORTING IMAGE SENSOR 69 communicate this activity to other neurons or to an output device. The event-address scheme shares communication wires by communicating the identity (i.e., address) of the neuron when it fires a pulse. Since the time is inherently measured across the entire system, the receiver recovers the firing rate for each transmitting neuron. The intensity-to-time paradigm synchronizes the responses at the beginning of operations and deals with the time intervals each “neuron” takes to fire its first pulse. III. SORTING IMAGE SENSOR CHIP By using the intensity-to-time paradigm, we have developed a sorting computational sensor—an analog VLSI sensor which sorts the pixel of an input image by their intensities. The chip detects an image focused thereon and computes an image of indices. The image of indices has a uniform histogram which has several important properties: 1) contrast is maximally enhanced; 2) available dynamic range of readout circuitry is equally utilized, i.e., the values read out from the chip use available bits most efficiently; 3) image of indices never saturates, and always preserves is the the same range (e.g., from 1 to , where number of pixel). During the computation, the chip computes a cumulative histogram—one global descriptor of the detected scene—and reports it with low-latency on one of the pins. In fact, the global cumulative histogram is used in top-down fashion to update information in local processors and produce the image of indices. The sorting operation is accomplished in the following way. The plurality of cells detect light and generate events according is decaying function. Cells receiving more to (1), where light generate events before cells receiving less light. The global processor counts the events generated within the array. Since the intensities are ordered in time, the count represents the order, or index, of the cell that is generating an event next. For example, if events were already generated at the instant a cell generates the event, the order of the cell is . Therefore, the sorting is done by associating the cell(s) currently generating an event with the current count produced by the global counter. Fig. 2 shows the circuit of the sorting sensor. The global processor/counter comprises an array of constant current , a current-to-voltage converter (resistor ), a sources and . Upon the arrival voltage follower, and wires of the event generated by a cell, the corresponding individual is turned on via switch . The current current source . The cumulative current sources are summed in the wire continuously reports the number of cells in the wire that have responded with an event—the global count. The cumulative current is converted to a voltage via resistor , and fed in a top-down fashion to the local processors in the . cell array via wire The local processor in each cell comprises a track-and-hold until the (T/H) circuit. The T/H tracks the voltage on event in the cell is generated. At that point the local processors Fig. 2. Schematic diagram of the sorting computational sensor. Fig. 3. Sorting computational sensor: a four cell simulated operation. remember voltage supplied by the global counter on the wire . Thus, the appropriate index is assigned to each cell. The remaining portion of the cell comprises the photo sensitive intensity-to-time event generator which generates an event according to (1). Fig. 3 shows the simulation of the circuit operation for the sorting sensor with four cells. operating in the photon flux integrating A photodiode mode [21] detects the light. In this mode of operation the capacitance of the diode is charged to a high potential and left to float. Since the diode capacitance is discharged by the 70 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 1, FEBRUARY 1999 photo current, the voltage decreases approximately linearly at a rate proportional to the amount of light impinging on the diode (Fig. 3, top graph). The diode voltage is monitored by a CMOS inverter. Once the diode voltage falls to the threshold of the inverter, the inverter’s output changes state from low to high (Fig. 3, is included to provide a positive second graph). A switch feed back and force rapid latching action. The transition in the output of the inverter represents the event generated by in the the cell. It controls the instant when the capacitor . It also T/H memorizes the voltage supplied on the wire controls the instant when the current is supplied to the wire . (Fig. 3, third graph) represents The voltage on the wire the index of a cell that is changing state and is supplied to the global input wire. The T/H within each cell follows this voltage retains until it is disconnected, at which point a capacitor the index of the cell (Fig. 3, bottom graph). The bottom graph shows that the cell with the highest intensity input has received the highest “index,” the next cell one “index” lower, and so on. The charge from the capacitors is readout by scanning the array after all the cells responded or after a predetermined time-out. The sorting sensor computes several important properties about the image focused thereon. First, the time when a cell triggers is approximately inversely proportional to the input radiation it receives (2) is the diodes capacitance, the power supply where the threshold voltage of the inverter, and photo voltage, current approximately proportional to the radiation, and is the dark current. Second, by summing up the currents , the global processor knows at each given time how many cells have responded with an event. Since events are generated according to (1), , or its inverse, the the cumulative current in the wire , are recognized as being temporal voltage on the wire representations of the cumulative histogram of the input data, . The time with the horizontal axis being proportional to derivative of the cumulative histogram signal is related to a histogram of the input image [2]. The cumulative histogram is one global property of the scene that is reported by the chip with very low latency. Such information can be used for preliminary decision making as soon as the first responses are received. In fact, it is used on-chip to quickly adapt index values for every new frame of input data. IV. VLSI REALIZATION AND EVALUATION A 21 26 cell sorting sensor has been built in 2 CMOS technology. The size of each cell is 76 90 , with 13% fill factor. The micrograph of the chip is shown in Fig. 4. An image was focused directly onto the silicon. The cumulative histogram waveform as well as the indices from the sorting sensor were digitized with 12-bit resolution. Scene 1 of an office environment was imaged by the sorting chip under common office illumination coming from Fig. 4. Micrograph of the sorting chip. the ceiling. Fig. 5(a) and (b) shows the cumulative histogram of the scene and the image of indices both computed by the chip. We evaluated the histogram of the indices which is shown in Fig. 5(c). From Fig. 5(c) it is seen that most pixel appeared to have different input intensities and, therefore, received different indices. Occasionally, as many as three pixel were assigned the same index. Overall, the histogram of indices is uniform, indicating that the sorting chip has performed correctly. Scene 2 from the same office was also imaged. Scene 1 (Fig. 5) contains more dark regions than Scene 2 (Fig. 6) because the moderately bright wall in the background is replaced by the dark regions of the person in partial shadow. Therefore, the chip takes longer to compute Scene 1 than Scene 2, but the dynamic range of the output indices is maintained. The total time shown on the time sample axis of the cumulative histograms is about 200 ms. By producing the cumulative histogram waveform and the image of indices, the sorting computational sensor provides all the necessary information for the inverse mapping—the mapping from the indices to the input intensities. Fig. 7(a) shows the image of indices for Scene 1 and the image of inferred input intensities. Fig. 7(b) includes an image taken by a commercial CCD camera for showing natural light conditions in the office environment from which Scene 1 was taken. The inferred input intensities closely resemble the natural low contrast conditions in the environment. There is a total of 546 pixel in the prototype described in this paper. The uniform histogram of indices [Figs. 5(c) and 6(c)] indicates that most of the pixel received different indices. Therefore, without special considerations as to the illumination conditions, low-noise circuit design and temperature and dark current control, our lab prototype readily provides indices with more than 9 b of resolution. Furthermore, the range of indices remains unchanged (from 0 to 545) and the indices maintain uniform histogram regardless of the range of input light intensity or its histogram. V. SORTING SENSOR IMAGE PROCESSING The data that are stored in the local processors are provided by the global processor. These global data—a function of time—define a mapping from the input intensities to output BRAJOVIC AND KANADE: VLSI SORTING IMAGE SENSOR 71 (a) (b) (c) Fig. 5. Scene 1 imaged by the sorting sensor: (a) cumulative histogram computed by the chip (voltage on histogram of indices. (a) in ), (b) image of indices, and (c) (b) (c) Fig. 6. Scene 1 imaged by the sorting sensor: (a) cumulative histogram computed by the chip (voltage on histogram of indices. data. For the sorting operation, this global function is the cumulative histogram computed by the chip itself. In general, when appropriately defined, this global function enables the sorting sensor to perform numerous other operations/mappings on input images. Examples of such operations include histogram computation and equalization, arbitrary point-to-point mapping, region segmentation and adaptive dynamic range imaging. In fact, in its native mode of operation—sorting—the chip provides all the information necessary to perform any mapping during the readout. W W in ), (b) image of indices, and (c) B. Linear Imaging When the waveform supplied to the input wire is inversely proportional to time, the values stored in the capacitors are proportional to the input intensity, implementing a linear camera. The results of such mapping have been illustrated in Fig. 7. As expected, the result is similar to the image obtained by the linear CCD imager. (The CCD image and sorting sensor image are obtained within minutes from each other, under the same illumination condition.) C. Scene Change Detection A. Histogram Equalization When the voltage of the cumulative histogram (computed by the chip itself) is supplied to the local processors, the generated image is a histogram-equalized version of the input image [2]. This is the basic mode of operation for the sorting chip and has been illustrated in the previous section. Analyzing the change in the histogram pattern is a basic technique to classify images or detect a scene change. The sorting computational sensor computes the cumulative histogram at real-time and can be used for low-latency scene discrimination/surveillance without requiring the image to be read out. For example, by comparing the cumulative 72 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 1, FEBRUARY 1999 (a) Fig. 9. A scene with back lit objects as captured by a conventional CCD camera. (b) Fig. 7. (a) Indices from the sorting sensor and inferred input intensity and (b) CCD camera image. and (c) in which the illuminated and shadowed regions, respectively, are “colored” as a black region. E. Adaptive Dynamic Range Imaging Fig. 8. Detecting a scene change by observing cumulative histograms only. histograms for Scenes 1 and 2, one could conclude that the brightest pixel (i.e., computer monitor) did not change (see Fig. 8). One also could conclude that the remainder of the image in Scene 2 is brighter than in Scene 1, since the Scene 2 takes less time to compute. Other more intelligent and detailed reasoning about the scene based only on the cumulative histogram is possible. D. Image Segmentation Thresholds is a rudimentary technique to segment an image into regions. The cumulative histogram can be used to determine this threshold. Pixel from a single region often have pixel of similar intensity that appear as clusters in the image histogram [2]. The values which ought to be stored in the cells can be generated to correspond to the “label” of each such region. A global processor can be devised that performs this labeling by updating the supplied value (i.e., label) when the transition between the clusters in the (cumulative) histogram is detected. An example of segmentation is shown in Fig. 10(b) For faithful imaging of scenes with strong shadows, a huge dynamic range linear camera is needed. For example, the illumination of the scene which is directly exposed to the sunlight is several orders of magnitude greater than the illumination for the surfaces in the shadow. Due to the inherently large dynamic range of the sorting sensor, both illuminated and shadowed pixel can be mapped to the same output range during a single frame. We demonstrate this concept with back illuminated objects. Fig. 9 shows a global view of this scene as captured by a conventional CCD camera. Due to the limited dynamic range of the CCD camera, the foreground is poorly imaged and is mostly black. (The white box roughly marks the field-of-view for the sorting sensor.) When the scene is imaged with the sorting sensor [Fig. 10(a)], the detail in the dark foreground is resolved, as well as the detail in the bright background. Since all 546 indices are competing to be displayed within the 256 levels allowed for the postscript images in this paper, one enhancement for purpose of human viewing is to segment the image and amplify only dark pixel. The result is shown in Fig. 10(b). Conversely, as shown in Fig. 10(c), the bright pixel can be spanned to the full (8 b) output range. Finally, if these two mappings are performed simultaneously, the shadows are removed [Fig. 10(d)]. The same method can be applied to the image obtained from a standard CCD camera. If the CCD image of Fig. 9 is cropped to the white box, and such an image is histogram-equalized, we arrive at the result shown in Fig. 11(a). This image is analogous to the image of indices obtained by the sorting sensor [Fig. 10(a)]. Due to the limited dynamic range, noise and quantization, the CCD image only resolves the face with 2–3 bits. The histogram-equalized image from the CCD is used for further mapping using the same steps as for Fig. 10(d). Due to obvious reasons, the result is poor. In contrast, the sorting computational sensor allocates as many output levels (i.e., indices) as there are pixel within the dark region, or the BRAJOVIC AND KANADE: VLSI SORTING IMAGE SENSOR 73 (a) Fig. 12. Sequence of images of indices computed by the sorting sensors. VI. ERROR ANALYSIS (b) (c) (d) Fig. 10. Sorting sensor processing: (a) data from the sensors; (b) segmentation (viewing the shadowed region); (c) segmentation (viewing illuminated region); and (d) segmentation and shadow removal. Theoretically, the dynamic range of the scene detectable by the sorting sensors is unlimited. Of course, in practice the actual dynamic range of the sensor will be determined by the capabilities of the photo detector, as well as by the switching speed and dark current levels. First we investigate the mismatch of the cells. Even when receiving same light levels, the cells do not respond at the same time. This determines the fundamental accuracy of the intensity-to-time paradigm. Given (2), the input photo current can be found as (3) , and where error can be found as (a) (b) Fig. 11. Conventional CCD camera processing: (a) histogram equalization of the window and (b) segmentation and shadow removal. entire image for that matter. By comparing Figs. 10(d) and 11(b), the superior utilization of the sensory signal with the sorting chip is obvious. The adaptation of the dynamic range of the sorting sensor is also illustrated in Fig. 12, showing a sequence of 93 images of indices computed by the sorting sensor. The sensor was stationary, and the only changes in the scene are due to subject movement. By observing the wall in the background, we can see the effects of adaptive dynamic range: even though the physical wall does not change the brightness, it appears dimmer in those frames in which bright levels are taken by pixel which are physically brighter (e.g., subject’s face and arm). When the subject turns and fills the field-of-view with dark objects (e.g., hair) the wall appears brighter since it is now taking higher indices. Also, note that the maximum contrast is maintained in all the images since all images of indices have a uniform histogram. is the dark current. The relative (4) , represents fluctuation of the dark where represents fluctuations of the current over the sensor area, photo detector capacitance (e.g., mismatch of the photo derepresents the mismatch of the threshold voltages tectors), and the diode’s reset noise, and represents the fluctuation in the switching speed of the control element. After substituting (2) in the last term in (4) relative error becomes (5) (6) where , , and substitute constant terms in (5). This error model follows the intuition: for high levels of illumination, when the cells respond quickly, the dominant cause of error is the fluctuation in the switching speed; for low illumination levels, the dominant factor is the fluctuation in the dark current. The constants , , and were experimentally determined from the prototype chip. Without the lens in front of the sensor, the sensor was illuminated by a halogen light source reflected 74 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 1, FEBRUARY 1999 TABLE I ERROR PERFORMANCE OF THE SORTING SENSOR PROTOTYPE Fig. 13. Relative error I =I : experimental data points and fitted error model. from a white cardboard. As the cardboard was positioned several meters from the sensor, the illumination field was considered uniform over the sensor’s surface. The amount of light falling on the sensor’s surface was controlled by changing the angle between the light source and the cardboard. The cumulative histogram waveform was gathered for 43 different light levels. [We don’t know the absolute value of light levels. As an illustration for a reader, the brightest level in our experiment was comparable to the level of an average size ) illuminated with a 150 W bulb; the room (e.g., darkest level was comparable to the same room illuminated with a desk lamp.] From the cumulative histogram waveforms and (3), the mean value , and standard deviation , were 1/s, computed in arbitrary current units [ACU]. [1 ACU i.e., 1 ACU triggers an event according to (2) 1 s after the beginning of the frame integration.] The error model (6) was fitted to the data. The results are tabulated in Table I and graphed in Fig. 13. For the signal-to-noise ratio (SNR) of one, the dynamic range of the sensor based on the model is over 10 . If the three sigma rule is used for the noise limits, the dynamic range is over 10 . However, the detectable lower limit on the input photo current is determined by the level of the dark current. In the experiment, we determined that the average dark current is about 0.2 ACU; therefore, for SNR 1 we require the lowest input photo current to be 0.2 ACU. Then, the dynamic range is 1 : 116 650 for one sigma rule and 1 : 38 880 for the three- sigma rule. Given constant dark current, the dynamic range is limited by the error the sensor makes when detecting the high illumination levels. The dominant source for this error is the fluctuation in the turn-on time of the inverters. In our experiment, this fluctuation is about 43 s (i.e., constant ). This is very high switching fluctuation. It is probably due to the fact that 1) input voltage is slowly approaching threshold level of the inverter, thus causing the long transition times at the inverter’s output; 2) positive feedback transistor is active only after the “decision” to trip is made; 3) in a static CMOS inverter the p- and n-channel transistors “fight” each other for slow-changing inputs; 4) there could be some systematic limitation in our instrumentation setup and/or the conditions under which we assume equal illumination for all pixel. In all, the switching fluctuation is approximately 10% of inverter output transition time (i.e., rise time) in the cell receiving the highest intensity in our experiment, which is reasonable. A higher gain thresholding element would probably perform better. This hypothesis will be verified with a new prototype currently being fabricated. Other sources of error, fluctuations in the dark current (i.e., constant ) and mismatch of and (i.e., constant ), are within reasonable limits. Relative error for the dark current is approximately 10%, while the lumped and is approximately 0.5%. relative error for The second issue we would like to consider is the error the sorting sensor makes when computing the cumulative histogram. This error is due to the mismatch of the current sources . Since there are typically thousands of cells in the is very low, sorting image sensors, the level of current pushing the corresponding transistors into the subthreshold regime. In this regime, the current sources could mismatch by 100%, i.e., one current source can be twice as large as another [14]. Nonetheless, the monotonousness in the cumulative histogram is maintained. When the cumulative histogram is used for inverse mapping, the mapping from indices to the input intensities, the error in cumulative histogram is not significant as it will be directly undone. The error that could be significant when mapping from indices to input intensities, however, is the readout error for each index. If the scene produced long horizontal segments in the cumulative histogram, such as the example in Fig. 10(a), then a small error in index can result in a large error in inferred response time for a particular cell. This problem can be handled by prohibiting the mapping process to return times within the interval of the long horizontal segments in the cumulative histogram. A few pixel may be grossly misclassified, but overall recovery of input intensities is good. VII. CONCLUSION The intensity-to-time processing paradigm enables VLSI computational sensors to be massively-parallel computational engines which make global computation or overall decisions about the sensed scene and reports such decisions on a few output pins of the chip with low latency. The power of this BRAJOVIC AND KANADE: VLSI SORTING IMAGE SENSOR paradigm is demonstrated with an analog VLSI implementation of sorting—an operation still challenging in computer science when performed on large groups of data. This work shows that, if an appropriate relationship is maintained between the circuitry, algorithm, and application, a surprisingly powerful performance can be achieved in a fairly simple but fairly high resolution VLSI vision computational sensor. ACKNOWLEDGMENT The authors would like to acknowledge the critical and constructive comments by the reviewers. REFERENCES [1] J. Aloimonos, Ed., Active Perception. Hillsdale, NJ: Lawrence Erlbaum Associates, 1993. [2] D. H. Ballard and C. M. Brown, Computer Vision. Englewood Cliffs, NJ: Prentice-Hall, 1982. [3] V. Brajovic, ”Computational sensors for global operations in vision,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 1996. [4] V. Brajovic and T. Kanade, “A sorting image sensor: An example of massively parallel intensity-to-time processing for low-latency computational sensors,” in Proc. 1996 IEEE Int. Conf. Robot. Automat., Minneapolis, MN, Apr. 1996, pp. 1638–1643. , “Computational sensors for global operations,” in IUS Proc., [5] 1994, pp. 621–630. [6] , “Computational sensor for visual tracking with attention,” IEEE J. Solid-State Circuits, vol. 33, Aug. 1998. [7] P. Y. Burgi and T. Pun, “Asynchrony in image analysis: Using the luminance-to-response-latency relationship to improve segmentation,” J. Opt. Soc. Amer. A, vol. 11, no. 6, pp. 1720–1726, June 1994. [8] S. P. DeWeerth, “Analog VLSI circuits for stimulus localization and centroid computation,” Int. J. Comput. Vision, vol. 8, no. 3, pp. 191–202, 1992. [9] R. Forchheimer and A. Astrom, “Near-sensor image processing: A new paradigm,” IEEE Trans. Image Processing, vol. 3, pp. 736–746, Nov. 1994. [10] R. I. Geiger, L. P. E. Allen, and N. R. Strader, VLSI Design Techniques for Analog and Digital Circuits. New York: McGraw-Hill, 1990. [11] T. Kanade and R. Bajcsy, “Computational sensors: A report from DARPA workshop,” in Proc. Image Understanding Workshop, 1993. [12] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sivilotti, and D. Gillespie, “Silicon auditory processors as computer peripherals,” IEEE Trans. Neural Networks, vol. 4, May 1993. [13] B. Mathur and C. Koch, Eds., Visual Inform. Process.: From Neurons to Chips, Proc. SPIE, 1991, vol. 1473. [14] C. Mead, Analog VLSI and Neural Systems. Reading, MA: AddisonWesley, 1989. [15] M. Mahowald, “Computation and neural systems,” Ph.D. dissertation, California Inst. Technol., Pasadena, 1992. [16] T. G. Morris and S. P. DeWeerth, “Analog VLSI circuits for covert attentional shifts,” MicroNeuro ‘96, Lausanne, Switzerland. 75 [17] A. Mortara, E. A. Vittoz, and P. Venier, “A communication scheme for analog VLSI perceptive systems,” IEEE J. Solid-State Circuits, vol. 30, pp. 660–669, June 1995. [18] D. Standley, “An object position and orientation IC with embedded imager,” IEEE J. Solid-State Circuits, vol. 26, pp. 1853–1860, Dec. 1991. [19] B. Zavidovique and T. Bernard, “Generic functions for on-chip vision,” in Proc. ICPR, Conf. D, The Hague, The Netherlands, 1992, pp. 1–10. [20] H. Ripps and R. A. Weale, “Temporal analysis and resolution,” in The Eye, H. Davson, Ed. New York: Academic, 1976, vol. 2A, pp. 185–217. [21] G. P. Weckler, “Operation of p-n junction photodetectors in a photon flux integrating mode,” IEEE J. Solid-State Circuits, vol. sc-2, pp. 65–73, Sept. 1967. Vladimir Brajovic (S’88–M’96) received the Dipl. Eng. E.E. degree from the University of Belgrade, Yugoslavia, in 1987, the M.S.E.E. degree from Rutgers University, New Brunswick, NJ, in 1990, and the Ph.D. degree in robotics from Carnegie Mellon University, Pittsburgh, PA, in 1996. He is a Research Scientist with the Carnegie Mellon Robotics Institute, where he is the Director of the VLSI Computational Sensor Laboratory. His research interest include computational sensors, analog and mixed-signal VLSI, machine vision, robotics, signal processing, optics, and sensors. Dr. Brajovic received the Anton Philips Award at the 1996 IEEE International Conference on Robotics and Automation for his work on the sorting image sensor. Takeo Kanade (F’92) received the Doctoral degree in electrical engineering from Kyoto University, Japan, in 1974. After holding a faculty position in the Department of Information Science, Kyoto University, he joined Carnegie Mellon University, Pittsburgh, PA, in 1980, where he is currently the U. A. Helen Whitaker Professor of Computer Science and Director of the Robotics Institute. He has written more than 150 technical papers on computer vision, sensors, and robotics systems. Dr. Kanade has received several awards including the Joseph Engelberger Award, JARA Award, and a few best paper awards at international conferences. He has served on many government, industry, and university advisory or consultant committees, including the Aeronautics and Space Engineering Board (ASEB) of the National Research Council, NASA’s Advanced Technology Advisory Committee (Congressional Mandate Committee), and the Advisory Board of the Canadian Institute for Advanced Research. He has been elected to the National Academy of Engineering and is a Founding Fellow of the American Association of Artificial Intelligence.