¹Univ.-Augenklinik, Freiburg, Germany
²Univ.-Augenklinik, Würzburg, Germany
³Inst. f. Medizinische Psychologie, Universität Magdeburg,Germany
Based on an article that appeared in a special issue of Spatial Vision 10:403-414 (1997), published by VSP.
Raster-based cathode-ray tubes (CRTs) are increasingly used for stimulus presentation. While very flexible, their consumer-electronics-based design can limit their value in vision research. We have systematically compiled their limitations of resolution in time, space, intensity and wavelength. Often, ingenious ideas can circumvent such limitations for specific experiments. Some ad-hoc solutions, as well as the more general techniques of dithering and anti-aliasing, are presented.
Computer-controlled visual display units are increasingly used to present stimuli for vision experiments. With few exceptions, they are designed for consumer electronics, are developed under commercial constraints for large quantity production and represent a compromise between image quality and cost within current technology. Vision science therefore, through its interest in the limits of vision, immediately meets the limitations of the apparatus. Most vision experiments now use raster-based cathode-ray tubes (CRTs). Usually, they are based on raster-scan techniques which are, in many respects, excellently suited for visual stimulus presentation (Watson et al. 1986). Programming for stimulus generation is a typical case of “re-inventing the wheel”. Many of the problems and their solutions presented in this paper have occurred to other authors (for summaries see Brainard 1996; Cowan 1983, 1995; Mollon and Baker 1995). A wide dissemination of awareness of methodological problems seems useful. Instead of an in-depth treatment, we here present a systematic overview of all problems that have occurred to us (in both senses of the word), or arose in the literature or in discussions with colleagues. Furthermore, we offer some possible solutions. Three areas will be covered: (1) limits of resolution in space, (2) limits of resolution in time, and (3) limits of resolution in intensity and colour and limited colour gamut. Stimulus artifacts can further result from a combination of limitations in each of these areas. To bring some order into the discussion, a systematic classification scheme of CRT artifacts is shown in Table 1.
|Table 1. CRT artifact classification scheme. Artefacts are arranged on the dimensions space, time, and intensity, and interactions thereof. On the diagonal are artifacts confined to just one of these dimensions; e.g., the orientation of a line can affect its width, which is a purely spatial artifact. Off-diagonal are interactions between dimensions; e.g., presenting a stimulus at different screen locations affects its time of appearance relative to the start of the frame, an artifact that mixes space and time; or, horizontal lines generally will have a higher intensity as compared to vertical ones, an artifact that mixes space and intensity. In some cases numbers are given that hint on the size of the artifact.|
What spatial resolution is required? Let us assume that we want to explore the human contrast sensitivity function and need to generate appropriate sinewave-grating stimuli on a CRT. The highest spatial frequency resolvable by human beings is about 30 cyc/deg. From Nyquist's sampling theorem, one cycle of a horizontal or vertical grating, having the maximum spatial frequency fmax that can be realised on a CRT, has a period of 2 pixels (it will not be sinusoidal). The required pixel size xp to achieve a specific fmax is thus given by
xp = (p · d) / (360° · fmax), (1)
where d denotes the observer distance. At a typical reading distance of 40 cm for example, the required pixel size for a grating to be at the resolution limit of a young observer would be 0.12 mm. This is below the typical pixel size, or “dot pitch”, of current colour monitors which is around 0.25 mm (corresponding to 100 dpi). Thus, to reach the highest spatial frequency, the observer distance must be increased; based on the above, at least 80 cm distance are needed for a young observer, unless a monitor with smaller pixel size is available.
The spatial sampling rate from pixel quantisation limits not only the highest obtainable spatial frequency but also the increment between spatial frequencies that can be realised: To prevent spatial aliasing, spatial frequencies must not only be below the Nyquist frequency limit but must also be at integer dividers of the latter (Strasburger and Rentschler 1986; see there for efficient methods of grating generation). The next lower spatial frequencies that are realisable thus have 4, 6, etc. pixels per cycle. To extend the example above: With a dot pitch of 0.25 mm and observer distance at 80 cm, realisable spatial frequencies are thus 30, 15, 10, ... cyc/deg. Further limitations in spatial resolution arise from interactions with intensity and time; these will be discussed below.
The spatial resolution of CRTs is limited by electron optics. The spot on the screen is an image of the cathode. A smaller spot size is equivalent to a smaller active area of the cathode; it is only possible with decreased maximum luminance, as high electron density leads to widening of the beam through space-charge effects. With colour devices, additional constraints arise through the presence of the shadow mask, or the stripes in the case of the Trinitron tube. Consequently, it seems unlikely that large size CRTs with pixel sizes below 0.1 mm and sufficiently high luminance will become available in the near future.
Electron optics are also responsible for changes in the pixel point spread function with different levels of intensity (Lyons and Farrell 1989; Naiman and Makous 1992). Simply stated, increasing the brightness often blurs the stimulus.
Two kinds of spatial artifacts result from the ‘pixelation’ of the image. (1) Straight lines a on raster-scan device are normally drawn using a digital differential analyser algorithm (Bresenham 1965; Newman and Sproull 1979; Horn 1976). This leads to 'staircase' effects or 'jaggies' which are particularly visible in lines that differ only slightly in their orientation from horizontal or vertical. (2) Line thickness varies with orientation: Depending on the exact shape (circular vs. square) of the virtual pen used to draw the lines, lines at 45° can be markedly wider than horizontal or vertical lines.
What can be done about the limited spatial resolution of CRTs? If fine detail is required one can, of course, simply increase the observer distance to 3 m or more (see Equation 1). In turn, the maximum stimulus size will decrease, however. The compromise between spatial resolution and field size may be unacceptable in a number of research situations. For the measurement of visual acuity, it turns out that even at 5 m observer distance anti-aliasing (explained below) is required to achieve sufficient resolution for measuring high acuities (Bach 1996, 1997a).
Sometimes, stimuli can be found that distribute staircase artifacts evenly between stimulus conditions such that they cannot be used as a discriminative cue. Meigen et al. (1994), for example, wanted to assess perceptual interactions between oblique lines. The tested line orientations were 0° and 16°. By tilting the CRT by 8°, the orientations were transformed to be 8° and +8°. Consequently, both line orientations suffered from identical staircase effects so that any experimentally found differences must have been visual.
A powerful method to increase spatial resolution is called “anti-aliasing”. Anti-aliasing increases spatial resolution through luminance modulation. The term is derived from Nyquist's sampling theorem (see above): If the spatial or temporal signal to be sampled contains energy at frequencies higher than twice the sampling frequency, these high frequencies are 'folded' downwards into the displayed spectrum and appear as 'alien' frequencies. Aliasing occurs in computer graphics because the pixel raster undersamples many graphical shapes and thus introduces such additional, 'alien' frequencies.
Anti-aliasing is well known in computer graphics (Foley et al. 1983, 1990) and can best be explained graphically (see Bach 1997a): The representation of an arbitrary graphical shape by an array of pixels can be imagined as the squares on graph paper, partially covered by the shape. Without anti-aliasing, a pixel in a black and white picture is black if covered by more than half by the graphical shape and is white otherwise. With anti-aliasing, pixel size remains the same but luminance is used to carry additional spatial information: Instead of being black or white, a pixel's intensity is set proportional to the pixel's area covered by the desired shape. Similar interpolation can be used for coloured pictures. After low-pass filtering through the optics of the eye, the retinal image of an anti-aliased image approximates the same image rendered with a smaller pixel size (depending on observer distance). Anti-aliasing improves the readability of small type at the cost of a slight blur and there now exist hardware solutions to rapidly display anti-aliased text. Anti-aliasing is available in many software packages (e.g. Photoshop) and is also a built-in feature of the Apple Macintosh operating system. We have used it to improve psychophysical stimuli (Meigen et al. 1994) and in the “Freiburg Visual Acuity Test” (Bach 1996, 1997a).
An obvious limitation in the display of moving or flickering stimuli on a CRT is introduced by the temporal sampling, imposed by the frame rate. Frame rate not only determines the highest frequency of change, but also temporal sampling: Only stimulus updates at integer multiples of the frame interval can be used; otherwise there will occur temporal aliasing, seen as stroboscopic (wagon wheel) effects or beats. This limits the choice of temporal stimulation frequencies, particularly at the higher frequency end. Note that the concept of a 'frame' as an event in time, which is implicitly used in explaining temporal aliasing, is a simplification: A frame is not presented instantaneously but is scanned (see, for instance, Bach 1997b, for temporal envelopes) and the percept of a field is the result of temporal integration both through phosphor persistence and visual persistence.
Phosphor decay, which happens in the microsecond and millisecond range but can last up to seconds, plays an important role whenever precise control of timing is required and has often not received adequate attention. One consequence of decay is a reduced contrast of moving as compared to stationary stimuli, the amount of reduction being difficult to measure. Many phosphors have complex decay functions, consisting of a primary rapid decay followed by a prolonged afterglow (Cowan and Rowell 1986; Mollon and Baker 1995). Mollon and Baker (1995) describe major scientific errors that resulted from these effects. The visibility of stimuli in short exposure experiments is affected by a complex interaction of physical and perceptual factors that make it difficult to predict visibility in a given situation. Di Lollo et al. (1997) analyse the condition of bright stimuli on a dark background and demonstrate the necessity of perceptual control experiments to avoid artifacts. Wolf and Deubel (1997) study the persistence of stimuli on scotopic and photopic backgrounds, with refreshed displays, and show sluggish offset stemming from phosphor characteristics that may be unrelated to the commonly considered phosphor decay functions.
Phosphor ageing is a further area of concern. Although the general intensity loss from ageing is easily corrected for by re-calibration, the experimental design might be jeopardised because a required luminance level that was available at the beginning of a long experiment may be out of range after some months of experimentation.
Periodic stimuli are subject to artifact-generating interactions of the spatial and temporal frequencies. This sets an upper limit for the product of spatial frequency (for periodic stimuli) and motion speed to avoid the wagon-wheel effect. For the artifact-free display of periodic moving square-wave gratings, there is a simple relationship between maximum speed vmax [deg/s], maximum spatial frequency fmax [cyc/s] and frame rate (monitor frequency), fframe [Hz]:
vmax [deg/s] · fmax [cyc/deg] = fframe /4 [Hz] (2).
The denominator has a value of 4 for the following reasons: For unique identification of direction, the maximum shift must be less than 1/2 of a period. Since the minimum shift is one pixel (to avoid further artifacts from non-integer pixel shifts), a moving grating with a 2-pixel period cannot be realised, the highest spatial frequency thus having a period of 4 pixels (this holds for square wave gratings; for sinusoidal gratings 3 pixels would suffice). For lower spatial frequencies, the maximum shift is between 1/4 and 1/2 of a period, thus equation (2) is based on the “worst case”, i.e., it gives the lowest upper limit for the product that holds for all spatial frequencies. Note that Equation (2) does not depend on observer distance. Speed is thus more limited for high than for low spatial frequencies.
Incidentally, reprogramming of a graphics board's timing parameters will be restricted by a trade-off between spatial and temporal resolution if the pixel clock stays constant. A constant pixel clock unfortunately prohibits pushing up the limit given by equation (2) by increasing the frame rate.
When timing accuracy in the millisecond region is required, it is important to remember that pixels lower in the display are drawn later, it takes, for example, about 15 ms to paint a frame on a 67-Hz display. Fortunately, once this is recognised, correction factors can easily be applied, at least for small objects (“correction for scan delay”; Sutter and Tran 1992).
The relationship between video voltage and luminance output of the screen is non-linear (Foley and Van Dam 1983; Poynton 1993, 1996). On the low luminance end, there is a floor effect that depends on the luminance/contrast controls of the CRT. In the medium range, luminance can fairly accurately be described by a power function L ~ vg, where g is a positive constant, with a value typically between 2 and 3. The effect is due to the grid-controlled valve characteristic (Grivet 1965) but often erroneously attributed to “phosphor non-linearity”. For example, Foley and Van Dam (1983, p. 594) state that the light output by a phosphor is related to the number of electrons by a power function with exponent g. This statement is in contradiction with established physical knowledge. Ardenne (1973, p. 170), Cowan (1995), Forand et al. (1990), and others state a linear relationship between beam current and the number of emitted photons.
In addition to the fairly well-known gamma-non-linearity, saturation effects occur at high intensities due to current limitations of the high-tension power supply that feeds the tube. The effect is more pronounced for large bright areas than for small ones. These beam-current limitations vary widely between different brands of CRTs. In most cases they also depend on the spatial extent of the stimulus since beam current can often be sustained for small but not for large stimuli. If the desired intensity range can be limited to the parabolic part of the function, the non-linearity can be overcome by gamma correction, which derives its name from the symbol used for the exponent (which, in turn, stems from the field of photography, where it is used to describe the transfer characteristics of photographic material). Gamma correction can be done in various ways (e.g. Stanislaw and Olzak 1990; Metha et al. 1993; Pelli and Zhang 1991; Poynton 1993); some CRTs have it already built in.
As can be easily verified, in all raster-scan CRTs a one-pixel horizontal line is brighter than a vertical one (Pelli 1997). The reason for this effect is the limited video signal bandwidth that attenuates the rapid horizontal signal variation required to render a vertical thin line but not the slower vertical variation across raster lines followed by CRT non-linearity.
The luminance difference is quite strong and can introduce serious luminance artifacts for vision experiment. The error increases with decreasing pixel size at constant dot pitch and can exceed 50%. Sometimes, it can be overcome by a trick: In an experiment that used texture patterns composed of oriented line segments, we needed horizontal (0°) and vertical (90°) lines without introducing luminance artifacts. We tilted the CRT by 45° and drew the lines at ±45°. This removed the luminance differences while still presenting the stimuli at 0° and 90° as desired (Bach and Meigen 1992; Solomon et al. 1993; for a different solution see Solomon et al. 1995). Software Solution: Bandpass-filtered noise textures
For independent control of orientation and spatial frequency, Gabor-filtered noise textures (Landy and Bergen 1991) or some generalisation thereof might be considered. Low-pass filtering introduces a correlation between neighbouring pixels, which reduces aliasing and orientation-intensity artifacts.
Most CRTs have marked luminance inhomogeneities across the screen (see, e.g., Cook et al. 1993). Metha et al. (1993) measured a more than 20% drop in the periphery vs. the centre of a high-quality CRT. Bohnsack et al. (1997) carefully measured relevant properties of the Sony 17se display and found a 13% drop of luminance in the periphery. This problem can easily be overlooked, as the low-spatial-frequency attenuation of the human contrast sensitivity function renders this effect unnoticeable at normal observer distances.
To produce the video voltage, digital-to-analogue converters (DACs) of 8-bit resolution per colour gun are used in virtually all current commercial computer graphics. The resulting 256 different intensity levels are insufficient for assessment of the human contrast threshold. For an evenly-spaced scale of 256 intensity levels starting from black, the contrast resolution would be 1/128 near half maximum luminance. However, the monitor's accelerating gamma characteristic approximately doubles the step size in that region, so that the smallest contrast is about 2% in Michelson units. The VGA standard specifies a still lower luminance resolution of (3x) 6 bits.
With 8-bit DACs, 256 different intensity levels can be produced for each gun: red, green and blue. This yields 256 x 256 x 256 = 17 million different colours. Because phosphors are broadband, the saturation of the available colours is limited (Brettel 1988; Foley and Van Dam 1983), which restricts colour vision experiments. Finally, the spectral radiant power distribution of the available colours restricts the application of computer monitors in colour matching experiments. It is, for example, not directly possible to simulate a Nagel anomalouscope on a colour monitor, since three independent light sources, red, green and yellow, are required that lie on the colour confusion lines for dichromates. While three monitor colours can easily be found that lie either on confusion lines for protanopes or on those for deuteranopes, they cannot be independent, as yellow always consists of a colour mixture of red and green. It is also not possible to replicate the fundamental colour matching experiment on a CRT, as any perceptual match between two CRT patches will be a physical match.
The three R-, G-, and B-guns can interact in various ways, including crosstalk in the graphics board, wiring and video amplifiers, and misalignment of the shadow mask. The reader is referred to Cowan and Rowell (1986), Brainard (1989), and Mollon and Baker (1995) for a detailed discussion of these intricate problems.
For many experiments it may be sufficient to cover only a small range of the contrast and use the full 1/256 resolution in this range. This has been done in various ways:
The use of a >=12-bit DAC is the solution of choice when high intensity and/or colour resolution is desired, provided the DAC is sufficiently fast. The combination of high spatial resolution (high number of scan lines) and high frame rate requires settling times that are near the edge of current technology.
If solely one-dimensional stimuli are required, one can modulate the stimulus in the 'slow' direction of the beam only (at line frequency rather than at pixel frequency). This was done, for instance, in some versions of the VSG graphics system from Cambridge Research Systems. When vertical gratings are required, the CRT can be turned on its side (colour artifacts may arise if the shadow mask is thereby dislocated; this may be remedied by degaussing).
DACs often have a high linearity compared to step size. Thus it makes sense to ‘cascade’ DACs: The output of one DAC is added to the scaled output of a second. The three DACs of a colour board can be combined with a few resistors to produce a high-resolution monochrome signal, or two colour boards can be combined to produce a high-resolution colour signal. This technique was discovered independently by a number of authors and is described, with full calibration details, by Pelli and Zhang (1991).
Attenuation of the video signal by a passive resistor network with computer-controlled relays provides the highest possible resolution at low contrast. This is fairly simple to realise with modern raster-scan CRTs that have separate inputs for video and synchronisation signals and with X-Y-Z displays (also called vector or point-plotting displays), as for example with Finley's (1997) device. Denis Pelli produced some prototype boards in the early eighties. Attenuation by a factor of 128, for example, with an 8-bit DAC results in a contrast resolution equivalent to 8 + 7 = 15 bits. The highest attenuation is limited by video-signal cross talk on the attenuator. With older, composite-video raster-scan CRTs, the accessory signals (sync and porch) need to be treated separately which requires additional circuitry (Strasburger 1996).
If high spatial resolution is not required, luminance and/or colour resolution can be improved at the cost of spatial resolution by 'dithering' (Foley and Van Dam 1983; Mulligan and Stone 1989; Savoy 1986; Ulichney 1987). Dithering can be viewed as the opposite of anti-aliasing. There are three types of dithering: error diffusion, ordered dithering, and random dithering. In dithering by error diffusion, a luminance error is defined for each pixel,
Error = requested intensity - closest available intensity,
where the requested intensity is the luminance of the current pixel plus a certain part of the error term of the surrounding pixels. Ideally, the error term is spread evenly among all surrounding pixels. In the Apple Macintosh operating system, the built-in dithering algorithm pushes half the error to the left or right, and the other half to the pixel immediately below (Othmer and Lipton 1992). Dithering by error diffusion is used in the contrast threshold section of the ‘Freiburg Visual Acuity Test’ (Bach 1996, 1997b) where sub-threshold contrasts are thus easily achieved.
Tyler recently described an ingenious variation of dithering, which he calls 'bit stealing' (Tyler et al. 1992, Tyler 1997). Fine variations of luminance are achieved through small changes in hue, the latter kept small enough to stay below the threshold for chromatic detection. By carefully distributing values between the R, G, and B channels, the luminance resolution can be increased by a factor of about 4. In vision experiments, possible artifacts arising from sub-threshold summation and facilitation should be assessed before relying on any of these techniques.
Computer-controlled CRTs are inexpensive, bright, have high resolution, and are easy to use. Advances in computer technology make it easier to design sophisticated stimuli to manipulate perception and study its mechanisms. However, CRTs also have limitations due to quantisation in time, space and intensity, possibly giving rise to obvious and sometimes to subtle artifacts. If the inherent limitations are recognised early (Table 1), they can often be avoided by systematic or ad-hoc solutions, depending on the specific type of experiment.
We thank David Brainard and Denis Pelli for detailed and insightful suggestions on the manuscript.
→More “on-line papers”