Biometrics - Face Recognition


Face recognition has recently attracted increasing attention and is beginning to be applied in a variety of domains, predominantly for security. We have instead developed a face recognition system for video indexing, with the joint purpose of labeling faces in the video, and identifying speakers. The face recognition system can also be used to supplement acoustic speaker identification, when the speaker's face is shown, to allow indexing of the speakers, as well as the selection of the correct speaker dependent model for speech transcription.

The first problem to be solved before attempting face recognition is to find the face in the image. Face finding solves the important task of making face recognition translation, scale and rotation independent, and can provide good initial constraints on the location of facial features. The first stage of the process is color segmentation, which simply determines if the proportion of skin tone pixels is greater than some threshold. Subsequently candidate regions are given scores based upon Fisher Linear Discriminant (FLD). This metric is originally built by comparing a large number of face and non-face patches. Candidate are also scored on Distance From Face Space (DFFS), a measure of how much they look like one of a large number of face patches used in training. All candidate regions exceeding a combined threshold are considered to be faces, after applying constraints such as no two faces may overlap.

 

Next, instead of searching for all the facial features directly in the face image, a few "high�level" features (eyes, nose, mouth) are first located, and then 26 "low�level" features (parts of the eyes, nose, mouth, eyebrows etc.) are located relative to the high�level feature locations. The approximate locations of the high�level features are known from statistics of mean and variance (relative to the nose position) gathered on a training database. The discriminant/DFFS templates are used to score each potential matching image patch for a given feature. Typically an area representing around 2 standard deviations is searched. Within the search region, the location with the highest score is deemed to be the location of the feature (see above left).

All this is a prelude to the actual face recognition algorithm. For this work, a constellation of local patches has been used as the representation (see above right). We chose this local template approach, in contrast to global identity templates such as those used in Eigenface systems, because of its greater robustness to facial image changes caused by effects such as lighting, expression, or facial appearance change (glasses, beard, haircut etc.). In this case a simple Gabor jet model, similar to that used by Wiskott and von der Malsburg, has been used to describe particular patches of the face corresponding to the 29 facial features found above. Each patch is represented by a feature vector consisting of 40 complex elements each, representing the filter responses of Gabor filters with 5 different scales and 8 different orientations, centered at the estimated feature location.

Recognition can now be carried out frame�by�frame using a training set constructed from the jet coefficient statistics. In this case, for each face found in a sequence, its likelihood given each of the training set models is calculated, assuming the coefficients are Gaussian�distributed. For a sequence of frames, the likelihoods are summed, and compared at the end of the sequence, taking the maximum likelihood training model as the correct answer. For speed of computation, diagonal covariance matrices are used.

Selected publications:


Face and Feature Finding for a Face Recognition System A.W. Senior In proceedings of Audio- and Video-based Biometric Person Authentication '99 pp. 154-159. Washington D. C. USA, March 22-24, 1999.

This paper deals with the problem of nding facial features in images, a problem which arises in face recognition and in a number of other applications, especially in human-computer interaction, which derive information from human faces. This paper describes a system for nding faces in images and for nding facial features given the estimated face location. The techniques, based on Fisher's linear discriminant and distance from feature space, are presented, and results are presented on faces from the FERET database. The paper further describes how feature collocation statistics can be used to verify feature locations and estimate the locations of missing features.

Contact: Sharath Pankanti