For about 30 years, CCD has been the prevailing technology in image capture.
Independently, CMOS has grown as the leading general purpose solid-state
technology, accounting for 90% of all chips manufactured today, from powerful
microprocessors to RAM and ROM memory chips. CCD and CMOS technologies
are both based on silicon, a semiconductor that is naturally photosensitive
in the visible light spectrum. At the beginning of the nineties, the MOS
transistor became small enough to fit discretely within a 10µm large
pixel, thus turning CMOS into a potential alternative imaging technology.
Thanks to the development of novel circuit techniques during the nineties,
in particular to deal with noise issues, the quality of CMOS imagers has
improved to nearly match that of CCDs.
Unlike CCDs however, CMOS imagers feature an architecture that is similar
to that of dRAMs, with random access to pixels. One key advantage of CMOS
over CCD imagers is that the complicated driving of large swing clocks
is no longer necessary, thus saving much power and avoiding specific external
drivers. In the pixel of CMOS imagers, MOS transistors have been used first
as read/reset pass gates, then as amplifying devices to improve noise performances.
But they have also proven very useful beside the imaging array to implement
other functions on the same chip. Actually, CMOS technology allows the
implementation of smart imaging systems on a single chip, by combining
all functions needed from photon capture to the output of digital bits.
Today, as the MOS transistor is in the deep submicron range, CMOS imagers
with 5µm pixel pitch are invading the low-end imaging market, e.g.
webcams, and are even found in professional digital cameras. More information
is available from the web sites of companies that have pioneered the field,
in particular VVL and Photobit.
From CMOS imaging to CMOS vision
Today, images are mainly captured to be presented to human observers, remotely
or later in time. In this perspective, CMOS imaging will certainly have
a large societal impact. But the need for digital images can only be as
large as the human abilities to exploit/absorb them (not to speak about
storing and communicating them). Human beings might actually be saturated
soon. In that sense, the true imaging revolution is yet to come, and it
is that of real time artificial vision. There is a need for compact (on
a chip) and low power vision systems, able to understand what they are
looking at. And the corresponding market is much larger than that of image
production. Typical applications are related to automatic surveillance,
identification, human assistance, autonomous robotics.
A CMOS circuit is precisely a place where light sensing (photodiode)
and intelligence (MOS transistor) can meet. A practical way to build a
CMOS vision system on a chip could be to combine a digital CMOS imager
(including ADC) with some microprocessor or DSP cores. But is it the best
architecture ? Our academic role is to deal with this issue in the most
fundamental way.
CMOS vision with CMOS retinas
We notice that one of the main difficulties faced today by computer architects
lies in the fracture between memory and computing. Looking at nature, a
remarkable feature is that animal vision always intimately combine sensing
and processing. These clues (and others) make us believe that the above
architecture is not so good. They rather advocate for "smart pixels" able
to process - to some extent - pixel data on-site. Arrays of such smart
pixels deserve to be called "artificial retinas", owing to the similarity
with their biological counterparts, with respect to the sensing/processing
intimacy. However, whereas biological retinas are dedicated to specific
visual tasks, CMOS artificial retinas can be made versatile by using simple
yet universal digital computing resources in the pixel, iteratively controlled
through external programming. For the past ten years, we have been investigating
programmable digital artificial retinas, and their use in vision systems.
Henceforth, programmable digital artificial retinas will simply be called
"digital retinas".
Digital CMOS retinas
So a digital retina is an imaging array with each pixel containing an analog-to-digital
converter and a tiny digital processor. Yet, analog circuitry is very useful
in digital retinas, not only for pixel-level ADC, but also for the compact
and low power implementation of the tiny digital processor, and even possibly
for some purely analog processing that would be too impractical digitally.
A digital retina is essentially a periodic array, where the periodic cell
is the pixel itself, or possibly a small cluster of pixels sharing common
hardware resources.
Without any high frequency, long distance, high capacitance, power-consuming
data transfer, a digital retina is meant (a) to capture images if placed
in the focal plane of a lens, (b) to convert them into digital format,
(c) to store some of them, (d) to perform various elementary computations
on the corresponding arrays of pixel data and (e) to aggregate these huge
data sets under compact - possibly scalar - forms, called image descriptors,
to be used externally. All these tasks are performed on external request,
that is through external programming.
So the tiny digital processor has to be universal enough to allow the
execution of the most general computation class. Under drastic area constraints
- it's in the pixel - it must definitely be a "highly reduced instruction
set" processor. This tiny digital processor combines memory, computing
and communication resources. Communications are either (a) local, among
neighbor pixels (NEWS network), or (b) global, for controlling the retina
array and extracting compact image descriptors, or (c) regional, to allow
the efficient manipulation of objects in middle level vision.
The basic control mode of digital retinas is SIMD (Single Instruction
Multiple Data), with the same instruction performed at the same time in
each pixel. This type of massive parallelism has been extensively investigated
in the seventies and the eighties, to be then abandoned in favor of more
flexible types, easing the use of off-the-shelf components. However using
SIMD in a digital retina is a completely different story. In particular,
there is no need for a high bandwidth off-chip communication link. Besides,
more flexible control modes can be easily implemented, in particular sub-resolution
SIMD modes, windowing mode, single pixel access mode, etc.
Image descriptors extracted from a digital retina are typically scalar
measures obtained through global summation over all pixels, or lists of
co-ordinates of pixels of interest. Whole images are not supposed to be
output from a digital retina, except for test purposes.
How large and complex can be a digital retina ? Using an outdated 0.8µm
CMOS technology, we have been able to successfully design and operate a
128x128 version, with 5 binary registers per pixel. Using a not-so-far-ahead
technology that could pack a billion transistors on a single die, a 512x512
retina could be built with a local pixel memory of several hundreds of
bits.
Vision systems including a digital retina
In brief, a digital retina is an array processor with integrated optical
input and external control. As so, it is a specialized computing unit in
the same way as a floating point unit in a microprocessor. To keep with
the analogy between biological and artificial vision, let us call "cortex"
the set of resources that must be associated to an artificial retina to
turn it into a vision system able to observe and understand scenes up to
action-oriented decision making. Functionally speaking, there is a master-slave
relationship between cortex and retina : the cortex controls the digital
retina in order to make it produce image descriptors useful for decision
making. Computationally speaking, the cortex+retina association is a hybrid
parallel system where the digital retina supports low to middle level vision
- as long as data feature a bidimensional structure - while the cortex
is in charge of middle to high level vision, the part of vision which is
closer to artificial intelligence than signal processing, and where data
structures are no longer images but rather scalar values, lists, vectors,
graphs, etc. The cortex is typically implemented using a standard microprocessor
with enhanced I/O.
The point of using a digital retina in a vision system is that the
volume of data manipulated is much larger at low than high level. To a
lesser extent, this is also true about the computational load. Then a vision
system incorporating a digital retina is expected to largely inherit from
the retina high performances in terms of volume, weight, speed and power
consumption. From the architect viewpoint, the cortex+retina association
is fairly insensitive to the memory/computing fracture mentioned earlier,
at the retina level as well as at the system level.
Retinal algorithms
Thanks to the versatility of its components, the cortex+retina system is
able to support a vast class of vision tasks. However, algorithms have
to fit the architectural specificities of the system :
- Computations must be partitioned between cortex and digital retina
such as to make the most of their respective computational characteristics,
including those of the specific operators settled in the retina to produce
image descriptors.
- A digital retina is a massively parallel array processor, subject
to the SIMD control mode or to its extensions. SIMD implies fully parallel
algorithms while the sub-resolution extensions favor scale-space frameworks.
- A digital retina does not use any external data memory, so with the
CMOS technologies presently available, only a small amount of bits can
be stored in the pixel to be used by the tiny digital processor. This puts
much pressure on the algorithm designer to make the procedures concise.
- Asynchronous circuits have a key role to play for the energy-efficient
support of middle level vision in digital retinas, in particular for regional
communication purposes. This implies mixed synchronous/asynchronous algorithms,
a new track of research.
- The availability of a random bit generator at the pixel level (from
electronic noise) opens the door to low power stochastic algorithms, such
as those based on Markov random fields.
Finally the retinal context imposes harsh programming constraints,
but they are also great opportunities to force the researcher's mind into
new and fruitful dimensions of algorithmic research. In the retinal crucible,
new concepts emerge which are often valuable on any standard computer.
However, they are targeted at the retinal context, where their real-time
execution allows the accurate assessment of their contribution to the visual
process. In the past years, our retina-inspired algorithmic research has
focused on digital halftoning, structural pattern recognition, skeletonization,
mathematical morphology, Markov-based motion detection, to mention only
the most successful.