

经过10周的时间,在受控条件下获得了240个不同个体的可见光和长波红外(LWIR)图像。在每周的作业中,对每个个体在两种不同的光照环境下(FERET和面部照片)和两种不同表情下(中性及其他)进行拍照。可见光图像是彩色的,分辨率为1200×1600。红外热图的分辨率为320× 240,色彩深度为12位。




每两种形态使用两种算法进行试验:主元分析法(PCA)和blinded for review 算法。





(1)    在时间延迟情况下使用主元分析(PCA)算法的热人脸识别性能明显要低于其对应的可见光图像的识别。

(2)    使用blinded for review 算法的两种形态图像的整体识别性能较PCA算法明显提高,更重要的是两种形态的识别性能曲线交叉次数增多,但依然在两者的误差条(Error Bar)之内。è该算法的两种形态的识别性能差异并不具有统计显著性



Thermal Face Recognition Over Time


1 Introduction

Face recognition with thermal infrared imagery has recently enjoyed renewed interest. While the volume of literature on the subject is notably smaller than that related to visible face recognition, there is nonetheless a steady stream of research [1, 2, 3, 4, 5, 6]. These papers have established that thermal imagery of human faces constitutes a valid biometric signature, though mostly relying on databases limited both in size and variability, due to the expense and complexity of extensive data collection. Early results were based on gallery and probe sets collected indoors during a single session. In that respect, they resemble the fa/fb tests in the FERET program [7].

目前,人们对基于热红外图像的人脸识别产生了新的兴趣,然而该方面的文献较基于可见光的人脸识别方面的文献来说要明显少得多,然而尽管如此,该方面的研究也还是很多的。在这些文献中指出,尽管大多数研究只依赖于容量和可变性有限的数据库 但是由于大量数据采集的费用和复杂性等因素,人脸的热图像还是形成了一个有效的生物特征签字辨别(biometric signature)早期的结果是基于在单阶段(single session)期间室内采集的的图库(gallery)和测试集合(probe sets),在这方面有些类似于在FERET项目中使用的fa/fb测试。

More recently, a study involving imagery collected indoors in a laboratory setting over multiple weeks was presented in [4, 8]. In that study, the authors note that when using a PCA-based recognition system, visible face recognition of time-lapse images yields better results than its thermal counterpart. They go on to conjecture, based on their visual analysis of the thermal imagery, that large variations of the thermal emission patterns of the face over time were responsible for the degraded performance. The current paper seeks to reproduce and extend some of the results in [4, 8]. In particular, we show that while those results are reproducible, it may be premature to attribute the performance difference to a modality-specific phenomenon. The results below demonstrate that a statistically significant performance difference between modalities can be measured when recognition is performed using PCA. However, when a more sophisticated algorithm is used, no such difference is measurable. This indicates that the authors of [4, 8] may have observed a measurement effect, and that the “inherent” value of visible and thermal imagery for time-lapse face recognition under controlled conditions is equivalent.


2 Data Collection and Normalization(数据采集和标准化)

 The data used in this study was generously provided by the authors of [4, 8]. A complete description of the data collection procedure can be found in the references, and we include a brief summary here. Visible and longwave IR (LWIR) images of 240 distinct subjects were acquired under controlled conditions, over a period of ten weeks. During each weekly session, each subject was imaged under two different illumination conditions (FERET and mugshot), and with two different expressions (“neutral” “and other”). Visible images were acquired in color and a 1200 × 1600 resolution. Thermal images were acquired at 320×240 resolution and 12 bit depth.

本研究所使用的数据大多是由[48]的作者提供的。数据采集步骤地详细描述请参照参考书目,本文只包括了一个简单的摘要。经过10周的时间,在受控条件下我们获得了240个不同个体的可见光和长波红外(LWIR)图像。在每周的作业中,对每个个体在两种不同的光照环境下(FERET和面部照片)两种不同表情下(中性及其他)进行拍照。可见光图像是彩色的,分辨率为1200×1600。红外热图的分辨率为320× 240,色彩深度为12位。

 Eye coordinates for all images, both visible and thermal, were manually located by the authors of [4, 8]. These coordinates were used to affinely register the images to a standard geometry with fixed eye locations and image size of 99×132 pixels. All necessary interpolation was performed bilinearly. The visible and thermal cameras were boresighted during data collection, therefore eye coordinates on corresponding images may not match exactly, as they had to be manually located in each modality separately. After alignment, all images were masked to remove all but the inner face, excluding ears and hair. Images used for the PCA experiments were further histogram-equalized, in order to match the processing in [4, 8]. Since the other algorithm does its own internal image processing, no equalization was performed on images before recognition.





3 Thermal Infrared Phenomenology(热红外现象学)

 While the nature of face imagery in the visible domain is well-studied, particularly with respect to illumination dependence [9], its thermal counterpart has received less attention. In [4], the authors show some variability in thermal emission patterns during time-lapse experiments, and properly blame it for decreased recognition performance. Figure 1 shows comparable variability within data collected with our own LWIR sensor. Each column shows images acquired in different sessions. It is clear that thermal emission patterns around the eyes, nose and mouth are rather different in different sessions. Such variations can be induced by changing environmental conditions. For example, exposed to cold or wind, capillary vessels at the surface of the skin contract, reducing the effective blood flow and thereby the surface temperature of the face. Also, when a subject transitions from a cold outdoor environment to a warm indoor one, a reverse process occurs, whereby capillaries dilate, suddenly flushing the skin with warm blood in the body’s effort to regain normal temperature. We have no knowledge of the environmental conditions during the data collection by the authors of [4], although we presume that they were fairly constant throughout all sessions.


 Additional fluctuations in thermal appearance are unrelated to ambient conditions, but are rather related to the subject’s metabolism. Vigorous physical activity, consumption of food, alcohol or caffeine may all affect the thermal appearance of a subject’s face. Also, high temporal frequency thermal variation is associated with breathing. The nose or mouth will appear cooler as the subject is inhaling and warmer as he or she exhales, since exhaled air is at core body temperature, which is several degrees warmer than skin temperature.


Much like recognition from visible imagery is affected by illumination, recognition with thermal imagery is affected by a number of exogenous and endogenous factors. And while the appearance of some features may change, their underlying shape remains the same and continues to hold useful information for recognition. Thus, much like in the case of visible imagery, different algorithms are more or less sensitive to image variations. Proper compensation for those variations is a critical step of any successful face (or generally object) recognition algorithm, regardless of modality. Clearly, the better algorithms for thermal face recognition will perform equivalent compensation on the infrared imagery prior to comparing probe and gallery samples.



 Variation in facial thermal emission from two subjects in different sessions. Left column is the enrollment image and right column is the test image.(不同阶段的两组不同采集对象的热图差异。左图是采集图像,有图是测试图像)


4 Algorithms Tested(算法测试)

 We performed experiments with two different algorithms in each of the two modalities: PCA with Mahalanobis angle distance and the (blinded for review) algorithm. The first is a standard algorithm with performance evaluations widely available in the literature, including [2], in which the authors present a comprehensive analysis of its performance on visible and thermal infrared imagery in a same-session recognition scenario. The second one is a commercial algorithm made available for testing in binary form.1

 我们对每两种形态使用两种算法进行试:主成分分析法(PCA)和blinded for review 算法。第一种是一种标准算法,伴随着性能赋值(performance evaluations使用在很多文章中都可以看到该算法,包括文章[2]中作者就其在同一阶段采集的可见光和红外图像人脸识别中的性能进行了广泛的分析。第二种方法是一种二进制形式测试的商业算法。


 The training set for both algorithms was completely disjoint from gallery and probe images, provided by the authors of [4], in time, space and subjects. That is, the training set was collected at an earlier time, in a different location and used a disjoint set of subjects. This insures that the results reported below are indicative of real-world performance. We should also note that the training set was different from that used in [4], since their complete training set was not available to us. We chose to use a larger set of images collected over the last several years with our own visible and thermal cameras. This further increases the realism of the results, since one cannot usually expect to have training imagery from the same camera as the testing imagery. As a result of these divergences from [4], our PCA results are somewhat different. However, the qualitative nature of the results, as seen below, agrees strongly with those of [4].


5 Experimental Results and Discussion(试验结果及讨论)

  In order to evaluate recognition performance with timelapse data, we performed the following experiments. The first-week frontal illumination images of each subject with neutral expression were used as the gallery. Thus the gallery contains a single image of each subject. 测试数据库的构成)For all weeks, the probe set contains neutral expression images of each subject, with mugshot lighting. The number of subjects in each week ranges from 44 to 68, while the number of overlapping subjects with respect to the first week ranges from 31 to 56. We computed top-rank recognition rates for each of the weekly probe sets with both modalities and algorithms. The results are shown in Figures 2 and 3. Note that the first data point in each graph corresponds to same-session recognition performance.



Figure 2: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using PCA. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.



Figure 3: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using (blinded for review) algorithm. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.


 Focusing for a moment on the performance curves, we notice that there is no clear trend for either visible or thermal modalities, encompassing weeks two through ten. That is, we do not see a clearly decreasing performance trend for either modality. This appears to indicate that whatever timelapse effects are responsible for performance degradation versus same-session results are roughly constant over the ten week trial period. Other studies have shown that over a period of years face recognition performance degrades linearly with time [10]. Our observation here is simply that the slope of the degradation line is small enough as to be nearly flat over a ten week period (except for the samesession result, of course). Following that observation, we assume that weekly recognition performances for both algorithms and modalities are drawn independently and distributed according to a (locally) constant distribution, which we may assume to be Gaussian. Using this assumption, we estimate the standard deviation of that distribution, and plot error bars at two standard deviations.


 Figure 2 shows the week by week recognition rates using PCA-based recognition. We see that, consistently with the results in [4, 8], thermal performance is lower than visible performance. In fact, for at least six out of nine timelapse weeks that difference is statistically significant. Table 1 shows mean recognition rates over weeks two through nine for each algorithm and modality. As shown in the last column, we see that mean visible performance is higher than the mean thermal performance by 2.17 standard deviations. This clearly indicates that thermal face recognition with PCA under a time-lapse scenario is significantly less reliable than its visible counterpart.



Turning to Figure 3, we see the results of running the same experiments with the (blinded for review) algorithm. Firstly, we note that overall recognition performance is markedly improved in both modalities. More importantly, we see that weekly performance curves for both modalities cross each other multiple times, while remaining within each other’s error bars. This indicates that the performance difference between modalities using this algorithm is not statistically significant. In fact, looking at Table 1, we see that the difference between mean performances for the modalities is only 0.21 standard deviations, hardly a significant result. We should also note that the mean visible time-lapse performance with this algorithm is 88.65%, compared to approximately 86.5% for the FaceIt algorithm, as reported in [4]. This shows that the (blinded for review) algorithm is competitive with the commercial state-of-theart on this data set, and therefore provides a fair means of evaluating thermal recognition performance, as using a poor visible algorithm for comparison would like thermal recognition appear better.

再看图三,我们可以看到在相同实验中使用了(blinded for review)算法的实验结果。首先,我们注意到两种形态的整体识别性能明显提高,更重要的一点是我们看到两种形态的性能曲线交叉次数增多,但依然在两者的误差条(Error Bar之内。这表明使用该算法的两种形态的识别性能差异并不具有统计显著性。实际上从表一我们可以看出两种形态*均性能的差异只有0.21个标准偏差,几乎没有明显的区别。我们还注意到,使用该算法的基于可见光时间延迟的人脸识别性能是88.65%,而在参考文献【4】中使用Face It 算法可达到*似86.5%的识别性能。这表明使用(blinded for review)算法可以同测试数据集的商业化state-of-theart相比。因此为评估热图人脸识别性能提供了一种不错的方法。同时用一个较差的可见光算法相比,热图人脸识别要好一些。

 Figures 2 and 3, as well as Table 1 also show the result of fusing both imaging modalities for recognition. Following [2] and [4] we simply add the scores from each modality to create a combined score. Recognition is performed by a nearest neighbor classifier with respect to the combined score. As many previous studies have shown [1, 2, 4], fusion greatly increases performance



Table 1: Mean top-match recognition performance for timelapse experiments with both algorithms.


6 Conclusions (结论)

The main conclusion of this paper is that one must be cautious when evaluating the value of an imaging modality for a specific recognition task. Ideally, this question should be framed as that of estimating the Bayes optimal error for a classification problem. Inevitably, that estimate is based on an empirical measure of performance which inextricably tied to a particular classifier. While such an estimate can provide us with a valuable upper bound on the Bayes error, it cannot separate classifier effects from data-specific(数据专用) behavior. In this case, we show that while the results in [4] are reproducible, they do not imply that time-lapse face recognition with thermal infrared imagery is inferior to that performed with visible imagery. We have shown by example that, at least on this data set, the Bayes errors for each modality are comparable. Are more detailed analysis will surely require a much larger pool of subjects.




Based on the preceding analysis, and recent results by the authors on time-lapse recognition with a more challenging, larger and diverse data set [11], we firmly believe that the use of thermal imagery of faces for biometric authentication is not only viable, but in certain circumstances even preferable over the use of visible images. Without a doubt, the used of fused visible and thermal imagery provides a level of performance not attainable by either alone.


Posted on 2008-08-22 20:00  leivo  阅读(651)  评论(0编辑  收藏  举报