During December 2002, Keith Nogueira was a senior majoring in Biomedical/Mechanical Engineering. Upon graduation, he plans to go into the biomedical industry.
When taking a look around a public place, one may observe an increase in the use of surveillance cameras. This is partially due to technological advances that have introduced new benefits for businesses and law enforcement agencies that implement video surveillance. Current technology allows an operator to view live surveillance footage from a remote location by transmitting the video over the Internet or through other cables. From the operator’s room, digital analysis of the video lets the operator detect intruders, alert authorities of suspicious or violent activity, and possibly identify criminals. This article discusses the principles involved in bringing these newer capabilities to light, from obtaining video footage to discovering whether there is a potential intruder.
Introduction
On September 13, 2002, a department store’s surveillance camera recorded a mother hitting her four year-old child after placing the child in her SUV – fortunately, the video from a surveillance camera also recorded the license plate on the mother’s SUV [1]. As television stations broadcast the video footage of this event, viewers across the nation were shocked at what they saw. Within a month of this occurrence, the child was taken away from her family and placed in a foster home, while the story of the now-famous mother, Madelyne Toogood, was told repeatedly on network television [1].
That same week, police in Virginia Beach, Virginia began to use identification technology in conjunction with their surveillance cameras along Atlantic Avenue in order to assist the police in locating fugitive pedestrians [2]. A few weeks after these stories, an article in The Washington Times discussed the Metropolitan Police Department’s plans for installing surveillance cameras that would allow them to observe monuments and major events [3].
In response to such events, news stories and organizations have expressed concern about the loss of privacy that results from the use of these cameras. Aside from their use in law enforcement, surveillance cameras are fairly common in metropolitan areas (Fig. 1 shows cameras in an urban setting). Large cities, such as Buffalo, New York, have cameras monitoring traffic flow so drivers and authorities can react to problems [4]. Local convenience stores and banks commonly use video cameras to prevent crime. A glance at the ceiling of a gambling floor in a Las Vegas casino reveals a huge network of cameras recording movements as tiny as a blackjack player’s gesture for another card.
In observing public places, it seems that surveillance cameras are increasing in popularity. Part of this is due to technological advances that are providing new benefits for businesses and law enforcement agencies that use video surveillance. Current technology allows an operator to view live surveillance footage from a remote location because it is transmitted over the Internet or through other cables. From the operator’s room, a digital analysis of the video allows the operator to detect intruders, alert operators of suspicious or violent activity, or identify criminals. This article discusses the principles involved in making such new capabilities possible, from obtaining the video to discovering a potential intruder.
Obtaining Live Video Images
Video Camera Operation
The basic theory involved in capturing video from a camera is simple. Incoming light passes through the camera lens and focuses upon a device called an image sensor. Most video cameras use something called a charge coupled device (CCD) as the image sensor [5]. The CCD has a collection of light-sensitive units called photosites, which become electrically charged when in contact with light; the charge of each photosite varies depending on how much light shines upon it [5].
In addition to this light-intensity sensitivity reminiscent of black and white cameras, color cameras have photosites that are sensitive to green, blue, and red light. Such sensitivity is often achieved via green, blue, or red filters [5]. Black and white cameras record the intensity of the light at each photosite, which is then interpreted by people as black, white, or a shade of gray. Color cameras follow a similar procedure, with the intensity of the three colors combining to appear as one color. Recording all these values at regular intervals generates a series of images that compose the video. The video signal, then, consists of frames of images, each with information related to a corresponding photosite. A simplified explanation of this is shown below.
The resolution of the video is determined by counting the vertical and horizontal dots of information that make up each frame. These dots are called pixels. In a digital camera, these pixels are converted to a digital format consisting of 1’s and 0’s, enabling them to be directly recorded to a computer. However, the signal from an analog camera is not converted – rather, it is typically sent via a coaxial cable directly to a video recorder [5].
Video Transmission
Once generated, a video signal is then sent to another location, which may be in the same building or somewhere farther away. The term closed-circuit television (CCTV) is often used in describing surveillance cameras. In this system, all cameras are directly connected to the recording system. Low-level analog signals traveling down long cables are susceptible to hum and radio frequency interference, which degrade image quality. In contrast, it is much simpler to send a digital signal because any digital network can transmit the data, and no equipment specific to CCTV is necessary.
However, video transmission on a digital network involves the handling of much information. To lessen this overwhelming information load, several measures can be taken: the reduction of the image’s resolution or number of frames sent, or data compression. Combinations or all of these techniques may be used; some form of compression is very common.
For surveillance applications, data compression occurs near the camera and is carried out by encoding the video signal using an algorithm. After transmitting the video to the receiver, the data signal is decoded for viewing. Several compression techniques incorporate a form of motion estimation to reduce the data needed. Common resolutions range from an array of 176 pixels by 144 vertical lines used in the H.320 protocol, to a square consisting of 1240 pixels on each side sometimes used in the MPEG-2 format [6].
Analyzing the Video
Suspicious Activity
The purpose of a surveillance camera system is to ensure the safety of individuals and property. For this reason, a computer program that detects suspicious incidents would be highly desirable. An operator would be more likely to notice violent activity if there is a lot of background movement. Moreover, suspicious activity could be found by analyzing postures of individuals, thereby furthering the overall goal of locating intruders and criminals.
J.A. Freer studied the development of recognizing suspicious activity. First, he defined suspicious activity, considering a crouching position as the most suspicious, while a standing person drew intermediate levels of suspicion, and a walking person was least suspicious [7]. Before such analysis can be performed, though, a person first must be detected in the video images and isolated from the rest of the film.
Is there an intruder?
To detect an intruder, Freer began by recording a background with no intruders present (Freer, et al). While the camera transmitted the video to the computer, each frame was analyzed. The first step involved the reduction of random noise and distortion in the video. To do this, each pixel was modified to reflect the intensity of the pixels surrounding it [7], resulting in an averaging out of extreme intensities. Once this new image was generated, it was compared to the original image of the plain background with no intruder present.
Comparing these two images, each pixel of the new image was made black if the pixels were about the same shade of gray [7]. All the pixels that changed significantly were colored white to illicit contrast. This process results in the white silhouette of any intruder, as the shape of the image stands out in a white color.
Once this silhouette is created, the computer counts all the white pixels. This number indicates the degree of difference between the video frame and the original background image. Large numbers of white pixels reflect a significant background change, thus an intruder is likely present. If only a few white pixels are counted, it can be assumed that the change is due to random noise or some other disturbance. This way, a bird or another small object moving across the screen is no cause for alarm. If no intruder is detected, the background image is replaced with the current frame of the video, which is compared against the next video frame as the program repeats this cycle once more [7].
Posture Determination
If an intruder is present, multiple methods that identify the posture of the individual exist. Freer makes a box just large enough to enclose the silhouette [7]; this box then helps determine the intruder’s location in the original image. This specific part of the original image is checked against a database of postures. A match in the database indicates the posture assumed by the intruder.
Yi Li uses a slightly different method. Rather than using the original image, the silhouette image helps determine posture. First, the computer creates a model of the human body by using fourteen rectangles with rounded corners, each corresponding to a section of the body [8]. The rectangles are not all the same size but are all described as being connected in some way–a rectangle corresponding to the torso would be larger than ones corresponding to the feet. These computer-generated rectangles are then lined up against the larger parts of the silhouette. After the large rectangles line up, the smaller ones are matched to the silhouette image. When the computer finishes lining up all rectangles, the set as a whole is compared to a database to determine the posture of the individual [8].
In addition, Yi Li’s method attempts to predict where the person will move next and tries to match its rectangle model to the new silhouette from the next video frame [8]. The process of shape matching continues as long as the intruder is on-screen. The shapes that match correspond to a database of poses, which can then be classified into levels of suspicious activity.
Identification of Criminals
In surveillance systems, identifying a person can also be very beneficial. To do this, facial features must be extracted from the video images and compared to the database of images with names [9]. Although there are several methods of doing this, they are all beyond the scope of this article.
Conclusion
In surveillance, an automated method for determining the presence of an intruder, suspicious activity, or the identity of a criminal is highly desirable. All these methods involve the use of the information-processing skills of computers. As technology improves, it can be expected that better surveillance techniques will be developed and implemented; consequently, we may live in a society with increased video surveillance in public places, and the questions of privacy invasion will become more pressing. One positive aspect of this, though, is the realization of surveillance systems that automatically detect intruders and criminal behavior, made possible by researchers and engineers.
References
-
- [1] J. Black. “When Cameras are too Candid.” Business Week Online. Sept. 26, 2002. [Oct. 9, 2002].
- [2] W. Glanz. “Mugging for the Cops.” The Washington Times. Sept. 13, 2002. [Oct. 7, 2002].
- [3] B. DeBose. “Panel Defers Vote on Cameras.” The Washington Times. Sept. 25, 2002. [Oct. 7, 2002].
- [4] S. Linstedt. “Keeping an Eye on Traffic; on Highways Across the Buffalo Area, High-tech Cameras are Alerting Monitors to Problems So the Information Can Be Passed on to Authorities and Drivers.” The Buffalo News. Dec. 3, 2001.
- [5] T. Harris. “How Buying a Camcorder Works.”Howstuffworks, 2002. [Oct. 7, 2002].
- [6] S. Bradbury. “A Paper on Communications Protocols and Compression Techniques for Digital CCTV Applications.” IEEE Seminar on CCTV and Road Surveillance. May 1999.
- [7] J.A. Freer, B.J. Beggs, H.L. Fernandez-Canque, et al. “Automatic Recognition of Suspicious Activity for Camera Based Security Systems.” European Convention on Security and Detection, 1995.
- [8] L. Yi, M. Songde, L. Hanqing. “Human Posture Recognition Using Multi-Scale Morphological Method and Kalman Motion Estimation.” Proc. Fourteenth International Conference on Pattern Recognition, 1998.
- [9] S. Cruz-Llanas, J. Ortega-Garcia, E. Martinez-Torrico, et al. “Comparison of feature extraction techniques in automatic face recognition systems for security applications.” Proc. IEEE 34th Annual 2000 International Carnahan Conference , 2000.