人类视觉系统

源：The Human Visual System (Display Interfaces)

Dynamic Range and Visual Response

At any given moment, the eye is capable of discriminating varying levels of luminance over a range of perhaps 100:1 or slightly higher. If the brightness of a given object in the visual field falls below the lower end of the range at any moment, it is simply seen as black. Similarly, those items outside the range at the high end are seen as “bright”, with no possibility of being distinguished from each other in terms of perceived brightness. However, as we know from personal experience, the eye is capable of adapting with time over a much wider absolute range. The opening and closing of the iris varies the amount of light admitted to the interior of the eye, and (along with other adaptive processes) permits us to see well in conditions varying from bright sunlight to nearly the darkest of nights. In terms of the total range that can be covered by human vision with this adaptation, the figure is more like 10,000,000:1. The lowest luminance which is generally considered as visible under any conditions, to a fully dark-adapted eye, is about 0.0001-0.001 cd/m2; the greatest, at least in terms of what can be viewed without permanent damage to the eye,1 is on the order of 10,000 cd/m2, a value achievable from a highly reflective white surface in direct sunlight. Adaptation of the eye to varying light levels within this range permits the 100:1 range of discrimination to be set anywhere within this total absolute range.

Within a given adapted range, however, the response of the eye is not linear. At any given instant, we are capable of better discrimination at the lower end of the eye’s range than the higher – or in other words, it is easier to tell the difference between similar dimly lit areas of a given scene than to tell the difference between similar bright areas. (This is again as might be expected; it is more important, as a survival trait, to be able to detect objects – or threats -in a dimly lit area (such as a cave) than it is to be able to discriminate shadings on the same object in broad daylight.) The typical response curve is shown in Figure 2-8. This non-linear response has some significant implications for the realistic portrayal of images on electronic displays.The non-linearity of the response also has an impact on the amount of information required to properly convey visual data. Given the ability to discriminate luminance over only a range of 100:1 or slight higher, we are tempted to assume that only about 7-8 bits per sample would be required to encode luminance. Tests with 7-8 bits per sample of luminance with linear encoding, however, will show clearly discernible bands (contouring), especially in the darker areas, due to the eye’s ability to discern finer differences at the low end of the luminance range. Ten to twelve bits of luminance information per sample, if linear encoding is to be used, is generally assumed to be required for the realistic portrayal of images. (Note, however, that this level of performance is very often well beyond the capability of many display and image-sampling devices; noise in these systems may limit the resolvable bits/sample to a lower value, especially for those operating at “video” (smooth-motion) sampling rates.) Encoding full-color images, as opposed to simply luminance information only.

Figure 2-8 Typical normalized luminance response curve for the human eye, showing the nonlinear relationship between absolute luminance (within the current adapted range) and perceived brightness. This shows that the eye is more sensitive to luminance changes in the “dark” end of the current range than to similar-sized changes in “bright” areas. The response curve shown here is a power function wherein the perceived brightness is given as Y(1/25), or Y(04). Different standard models have used different values for the exponent in this function, ranging from about 0.333 to 0.450.

Chromatic Aberrations

Color affects our vision in at least one other, somewhat unexpected, manner. The lens of the eye is a simple double-convex type, but made of a clear, jellylike material rather than glass. In most optical equipment, focusing is achieved by varying the spacing of the optical elements (lenses, mirrors, etc.); in the eyes of living creatures, images are focused by altering the shape of the lens itself, and so its optical characteristics. (The curved surface of the transparent cornea also acts to bend light, and is a major contributor in focusing the image – however, its action is not variable.) However, simple lenses of any type suffer from a significant flaw with respect to color. The refractive index of any optical material, and so the degree to which light is “bent” at the interface of that material and air, varies with the frequency of the light. Higher-frequency light, toward the blue end of the spectrum, is bent less than lower-frequency light. If not compensated for, this has the effect of changing the focal length of the lens for various colors of light. (In conventional lenses, this compensation comes in the form of an additional optical element with a slightly different refractive index, bonded to the original simple lens. Such a color-corrected lens is called an achromat.)

$In a simple lens, higher-frequency light (i.e., blue) is refracted to a lesser degree than lower-frequency light (red). In the case of human vision, this results in the blue components of an image being focused effectively “behind” the red components, leading to a false sense of depth induced by color (chromostereopsis). This also makes it very tiring to look at images containing areas of bright red and blue in close proximity, as the eye have a very difficult time focusing!$

Figure 2-9 In a simple lens, higher-frequency light (i.e., blue) is refracted to a lesser degree than lower-frequency light (red). In the case of human vision, this results in the blue components of an image being focused effectively “behind” the red components, leading to a false sense of depth induced by color (chromostereopsis). This also makes it very tiring to look at images containing areas of bright red and blue in close proximity, as the eye have a very difficult time focusing!

With the simple lens of the eye, this sort of chromatic aberration results in images of different color being focused slightly differently. Pure fields of any given color can be brought into proper focus, through the adaptive action of the lens, but if objects of very different colors are seen in close proximity, a problem arises. The problem is at its worst, of course, with colors at the extremes of the visual spectrum – blue and red. If bright examples of both colors are seen together, the eye cannot focus correctly on both; the blue image focuses “behind” the red, as seen in Figure 2-9. Besides being a source of visual strain (as the eye/brain system attempts to resolve the conflict in focus), this also creates a false sense of depth. The blue object(s) are seen as behind the red, through chromostereopsis (the perception of depth resulting solely from color differences rather than actual differences in the physical distance between objects). Due to these problems, the use of such colors in close proximity – bright red text on a blue background, for instance – is to be avoided.

Stereopsis

Besides the false sense of visual depth mentioned above, human beings are, of course, very capable of seeing true depth – we have “three-dimensional,” or stereoscopic vision. By this we mean that human beings can get a sense of the distance to various objects, and their relative relationships in terms of distance to the viewer, simply by looking at them. This ability comes primarily (but not exclusively!) from the fact that we have two eyes which act together, seeing in very nearly the same direction at all times, and a visual system in the brain which is capable of synthesizing depth information from these two “flat”, or twodimensional, views of the world. In nature, stereo vision is most often found in creatures which are at least part-time hunters, and so need the ability to accurately judge the distance to prey (to leap the right distance, or to aim a spear, etc.). Most animal species which possess a sense of sight have two eyes (or at least two primary eyes), but relatively few have them properly located and working in concert so as to support stereo vision.

Perceiving depth visually (stereopsis, a general term covering such perception regardless of the basic process) is basically a matter of parallax. Both eyes focus on the same object, but due to the fact that they are spaced slightly apart in the head do not see it at quite the same angle. The eye/brain system notes this difference, and uses it to produce a sense of the distance to the object. This can also be used to impart a sense of depth to two-dimensional images; if each eye is presented with a “flat” view of the same scene, but the two views differ in a manner similar to that which results from the difference in viewing angle in a “real” scene, the visual system will perceive depth in the image. This is the principle underlying stereoscopic viewers or displays, which are arranged so as to present “left-eye” and “right-eye” images separately to the two eyes.

However, this parallax effect is not the only means through which we perceive depth visually. Some people have only one working eye, and yet still function well in situations requiring an understanding of depth; they are able to compensate through reliance on these other cues. (There is also a small percentage of the population who have functional vision in both eyes, and yet do not perceive depth through the normal process. In these cases, the eye/brain system, for whatever reason, never gained the ability to synthesize depth information from the two different views. Such people often do not realize that their deficiency exists at all, until they are unable to see the “3-D” effect from a stereoscopic display or viewer.) Depth is also perceived through the changes required to focus on nearby vs. distant objects, from differences in the rate at which objects are passing through the visual field (rapidly moving objects are seen as being closer than slower or stationary objects, in the absence of other cues), and, curiously, through delays in processing a given image in one eye relative to the other. (This latter case is known as the Pulfrich effect, and may be produced simply by changing the luminance of the image presented to one eye relative to the other.)

Temporal Response and Seeing Motion

Our eyes have the ability to see motion, at least up to rates normally encountered in nature. This tells us that the mechanisms of vision work relatively quickly; it does not take an unreasonable amount of time from the moment a given scene is imaged on the retina, the receptor cells respond to the pattern of light making up the image, the impulses are passed to the visual centers of the brain, and the information interpreted as the sensation we call vision. However, this action is not infinitely fast, nor is motion perceived in quite the way we might initially think.

Clearly, the perception of motion is going to be governed by how rapidly our eyes can process new images, or changes in the visual information presented to them. It takes time for the receptors to respond to a change in light level, and then time to “reset” themselves in order to be ready for the next change. It takes time for this information to be conveyed to the brain and to be processed. We can reasonably expect, then, that there will be a maximum rate at which such changes can be perceived at all, but that this rate will vary with certain conditions, such as the brightness or contrast of the changing area relative to the background, the size of the object within the visual field, and so forth.

We also should understand that the eye/brain system has evolved to track moving objects – to follow them and fixate upon them, even while they are moving – and how this occurs. Obviously, being able to accurately follow a moving object was a very important skill for creatures who are both trying to be successful as hunters, and not being successfully hunted themselves. So we (and other higher animals) evolved the ability to predict the path of a moving object quite well, as is demonstrated each time one catches a ball. But this does not mean that the eye itself is tracking these objects via a smooth, fluid motion. This would not work well, due to the fact that the receptors do take some finite time to respond as mentioned above. Instead, the eye moves in short, very rapid steps – called saccades – with the sense of vision effectively suppressed during these transitions. The eye captures a scene, moves slightly “ahead” such that the moving object will remain fixed within the field, then stops and captures the “new” scene. In one way, this is very similar to the action of a motion picture camera, which captures individual still images to show motion. In fact, it is practically impossible for one to consciously move their eyes in a smooth manner; almost invariably, the actual motion of the eye will be in a series of quick, short steps.

The temporal response of vision affects display system design primarily in two areas – ensuring that the display of moving objects will appear natural, and in making sure that the performance of certain display types (which do not behave as constant-luminance light sources) is acceptable. The term critical fusion frequency (CFF) is used to describe the rate at which, under a given set of conditions, the eye can be “fooled” into perceiving motion (from a series of still images) or luminance (from a varying source) as “smooth” or “constant.”

Flicker has always been one of the major concerns in the design and use of electronic displays, primarily because the dominant display type for years has been the cathode-ray tube, or CRT.If this process is not repeated often enough, the display appears to be rapidly flashing, an effect with is very annoying and fatiguing for the viewer. The key question, of course, is how often the refresh must occur in order to avoid this appearance – what is the critical fusion frequency for such a source?

The prediction of the CFF for displays in general is a fairly complex task. Factors affecting it include the luminance of the display in question, the amount of the visual field it occupies, the frequency, amplitude, decay characteristics, etc., of the variation in luminance, the average luminance of the surrounding environment, and of course the sensitivity of the individual viewer. Contrary to a popular misconception, display flicker is generally not the result of a “beat frequency” with flickering ambient lighting (the most common form of this myth involves fluorescent lights); flickering ambients can result in modulation of the contrast ratio of the display, but this is usually a relatively minor, second-order effect. The overall level of the ambient lighting does affect flicker, but only because it is the perceived brightness of the display relative to its surroundings which is important. (Of course, exactly how important this is depends on the amount of the visual field occupied by both the display and the surroundings.)

The mathematical models used to predict flicker come in large part from work done by Dr. Joyce Farrell and her team at Hewlett-Packard Laboratories (working with researchers from the University of California, Berkeley) in the 1980s [1,2]. This work became the basis for several standards regarding display flicker, notably the International Standards Organization’s ISO-9241-3 [3] set of ergonomic standards for CRT displays. A simplified form of the analysis, using assumptions appropriate for a typical CRT display in an office environment (specifically, a typical phosphor response with the display occupying about 70° of the visual field, in diagonal measurement), leads to an estimation of the CFF as a function of display luminance, as given in ISO-9241-3, of

where Lt is the display luminance in cd/m2.

Figure 2-10 Critical flicker-fusion frequencies (CFF) given by the ISO-9241-3 formula for a range of display luminance values. This calculation assumes a display occupying 70° of the visual field (diagonal measurement). Figures are given for both the mean CFF, and the CFF for the 95th percentile of the population, calculated as CFF(mean) + 1.65 x SD for the standard deviation values listed. The SD values in boldface are from the ISO-9241-3 standard; the remainder were derived via a linear interpolation. Note that these CFF calculation apply only to a CRT display, or a similar display technology in which the actual duration of the image is in reality relatively short compared to the refresh period. Such calculations do not apply to such types as the LCD, in which the display elements are illuminated to nearly their full intended value for most if not all of the frame time.

The distribution of CFF for the entire population has been shown to be essentially Gaussian, so to this mean one must add the appropriate multiple of the population’s standard deviation in order to determine the frequency at which the display would appear “flicker-free” to a given percentage of the population. For example, the frequency at which the display would appear flicker-free to 95% of the population would be found by determining the CFF based on the display luminance, and then adding 1.65 times the standard deviation at that luminance. Note that these formulas have been based on assumptions regarding both display luminance and size, and the average viewing distance, which correspond to typical desktop-monitor use. The above formula suggests that, for a CRT-based computer display of 120 cd/m2 luminance, and used at normal viewing distances, the refresh rate should be set to at least 71.5 Hz to appear flicker-free to half the population (this is the mean CFF predicted by the formula), and to not less than 81 Hz to satisfy 95% of the viewers. This is very typical for the desktop CRT monitor, and similar calculations have led to 85 Hz becoming a de-facto standard refresh rate to satisfy the “flicker-free” requirement of many ergonomic standards. A graph of the result of the above formula for mean CFF vs. luminance is shown in Figure 2-11, along with the standard deviations for inter-individual differences as established by the ISO-9241 standard. (Television, while operating at higher typical luminances, can get away with lower refresh rates since the display typically occupies a much smaller portion of the visual field than is the case with a desktop monitor.)

The update rate required for the perception of “smooth” motion is, fortunately, similar to that required for the avoidance of flicker, and in general is even lower. It is affected by many of the same factors, although one important consideration is that viewers on average tend to accept poorer motion rendition more readily than flicker. Acceptable motion can often be realized with an image update rate of only a few new images per second. For example, most “cartoon” animation employs a rate of between 10 and 24 new frames per second. The standard for the theatrical display of motion pictures is 24 frames/s in North America2 (25 frames/s is the more common rate in Europe). Finally, television systems, which are generally seen as providing very realistic motion, use a rate of 50 or 60 new images per second.This is, of course, very close to the refresh rates (60-85 Hz) generally considered to be “flicker-free” in many display applications.

While the rates required for good motion portrayal and a “flicker-free” image are similar, some interesting problems can arise when these rates are not precisely matched to each other. Examples of situations where this can occur are common in the computer graphics field (where new images may not be generated by the computer hardware at the same rate as that at which the display is being refreshed), and in cases of mixing systems of differing standard rates. An example of the latter is the display of film-sourced material on television; in North America, for instance, films are normally shot at 24 frames/s, while television uses a refresh rate of roughly 60 Hz. To accomplish this, a technique called “3:2 pulldown” is used. One frame of the film is shown for three refreshes of the television display (“fields”), while the next appears for only two (Figure 2-11). This results in the frames of the film being unequal in duration as displayed, which can result in certain motion artifacts (known as “judder”) as seen by the viewer.

Figure 2-11 To show standard motion pictures (shot at 24 frames/s) on US standard television (approx. 60 fields/s), a technique known as “3:2 pulldown” is used. However, the uneven duration of the original frames, as seen now by the viewer, can result in certain objectionable motion artifacts.

Figure 2-12 Effect of mismatched refresh and update rates. In this example, we assume that the display is being refresh 60 times per second; however, new images are being created only 20 times per second. This results in frame A being displayed for three refresh cycles, followed by frame B for the next three. The visual effect is simulated in the image at the bottom. Since the eye “expects” smooth movement, the center of the field of view moves slightly along the expected track of the moving object – but since the object in question has not actually moved for two out of every three displayed frames, the appearance is that of a moving object with “ghosts” or “shadows” resulting from the eye motion.

The problems here again have to do with how the eye/brain system responds to moving objects. Again, the motion of the eye is not smooth – it occurs in quick, short saccades, based in large part on where the brain expects the object being tracked to appear. If the object does not appear in the expected position, its image now registers on a different part of the retina. A curious example of this may be seen when the image update rate is related to the display’s refresh rate but is not the same. If, for instance, the display is being refreshed at 60 Hz, but only 20 new images are being provided per second, the object “really” appears in the same location three times before moving to its next position. The visual system, however, since it is expecting “smooth” motion, moves slightly “ahead” in the time of those two intermediate display refreshes. This results in the “stationary” image being seen by slightly different parts of the retina, and the object is seen as multiple copies along the direction of motion (Figure 2-12). In many applications, then, the perception of smooth motion will not depend as much on the absolute rate at which new images can be generated (at least above a certain minimum rate), but rather on making sure that this rate is kept constant and is properly matched to the display rate.

Display Ergonomics

Our desire, of course, is to produce display systems which are usable by the average viewer, and a large part of this means assuring that undue effort or stress is not required to use them. The field of properly matching machines to the capabilities and preferences of human beings is, of course, ergonomics, and the ergonomics of display systems has been a very important field in the past few decades. Many of the various international regulations and standards affecting display design have to do at least in part with ensuring proper display hardware and displayed image ergonomics.

Unfortunately, these factors were not always considered in the design and use of electronic displays, owing to a poor understanding of the field by early display system designers. This is not really the fault of those designers, as the widespread use of electronic displays was very new and the ergonomic factors themselves not yet researched in depth. However, today we have a far better understanding of these effects, and those wishing to implement successful display systems are well advised to be familiar with them. Not only will this lead to a product more acceptable to its intended users, but it compliance with the various standards in this area is often mandatory for even being able to sell the product into a given market.

Besides the standards for flicker already mentioned, items commonly covered in ergonomic guidelines or requirements include minimums and maximums for luminance and contrast, minimum capabilities for positioning the display screen (such as horizontal and vertical tilt/swivel requirements and minimum screen height from the work surface), character size and readability, the use of color, positional stability of the image (e.g., freedom from “jitter”), uniformity of brightness and color, and requirements for minimizing reflections or “glare” from the screen surface. A summary of specifications regarding some of the more important of these, from the ISO 9241-3 standard, is given in Table 2-1.

Table 2-1 Summary of ISO-9241-3 Ergonomic Requirements for CRT Displays

Item	ISO-9241-3 Ref.	Requirement
Design viewing distance	6.1	Min. 400 mm (300 mm in some cases)
Design line of sight angle	6.2	Horizontal to 60 deg. below horizontal
Angle of view	6.3	Legible up to 40° from the normal to the surface of the display
Displayed character height	6.4	Min. 16 minutes of arc; preferably 20-22
Character stroke width	6.5	1/6 to 1/12 of the character height
Character width/height ratio	6.6	0.5:1 to 1:1 allowed; 0.7:1 to 0.9:1 preferred
Between-word spacing	6.11	One character width (capital “N”)
Between-line spacing	6.12	One pixel
Display luminance	6.15	35 cd/m² min
Luminance contrast	6.16	Minimum 0.5 contrast mod; minimum 3:1 contrast ratio
Luminance uniformity	6.20	Not to exceed 1:7 to 1, as measured from the center to the edge of the display screen
Temporal instability (flicker)	6.23	Flicker-free to at least 90% of the user
	6.23	population
Spatial instability (jitter)	6.24	Maximum 0.0002 mm per mm viewing distance, 0.5-30 Hz

Note: This table is for example only; the complete ISO-9241-3 standard imposes specific measurement requirements and other conditions not detailed here.

posted @ 2015-09-09 17:17 酒醉的Tiger 阅读(766) 评论(0) 收藏举报

刷新页面返回顶部