TY - JOUR
T1 - Multi-view 3D face reconstruction with deep recurrent neural networks
AU - Dou, Pengfei
AU - Kakadiaris, Ioannis A.
N1 - Funding Information:
This material is based upon the work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001 . This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project “Image and Video Person Identification in an Operational Environment: Phase I” awarded to the University of Houston. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security.
Publisher Copyright:
© 2018
PY - 2018/12
Y1 - 2018/12
N2 - Image-based 3D face reconstruction has great potential in different areas, such as facial recognition, facial analysis, and facial animation. Due to the variations in image quality, single-image-based 3D face reconstruction might not be sufficient to accurately reconstruct a 3D face. To overcome this limitation, multi-view 3D face reconstruction uses multiple images of the same subject and aggregates complementary information for better accuracy. Though appealing, there are multiple challenges in practice. Among these challenges, the most significant is the difficulty to establish coherent and accurate correspondence among a set of images, especially when these images are captured under unconstrained in-the-wild condition. This work proposes a method, Deep Recurrent 3D FAce Reconstruction (DRFAR), to solve the task of multi-view 3D face reconstruction using a subspace representation of the 3D facial shape and a deep recurrent neural network that consists of both a deep convolutional neural network (DCNN) and a recurrent neural network (RNN). The DCNN disentangles the facial identity and the facial expression components for each single image independently, while the RNN fuses identity-related features from the DCNN and aggregates the identity specific contextual information, or the identity signal, from the whole set of images to estimate the facial identity parameter, which is robust to variations in image quality and is consistent over the whole set of images. Experimental results indicate significant improvement over state-of-the-art in both the accuracy and the consistency of 3D face reconstruction. Moreover, face recognition results on IJB-A with the UR2D face recognition pipeline indicate that, compared to single-view 3D face reconstruction, the proposed multi-view 3D face reconstruction algorithm can improve the face identification accuracy of UR2D by two percentage points in Rank-1 identification rate.
AB - Image-based 3D face reconstruction has great potential in different areas, such as facial recognition, facial analysis, and facial animation. Due to the variations in image quality, single-image-based 3D face reconstruction might not be sufficient to accurately reconstruct a 3D face. To overcome this limitation, multi-view 3D face reconstruction uses multiple images of the same subject and aggregates complementary information for better accuracy. Though appealing, there are multiple challenges in practice. Among these challenges, the most significant is the difficulty to establish coherent and accurate correspondence among a set of images, especially when these images are captured under unconstrained in-the-wild condition. This work proposes a method, Deep Recurrent 3D FAce Reconstruction (DRFAR), to solve the task of multi-view 3D face reconstruction using a subspace representation of the 3D facial shape and a deep recurrent neural network that consists of both a deep convolutional neural network (DCNN) and a recurrent neural network (RNN). The DCNN disentangles the facial identity and the facial expression components for each single image independently, while the RNN fuses identity-related features from the DCNN and aggregates the identity specific contextual information, or the identity signal, from the whole set of images to estimate the facial identity parameter, which is robust to variations in image quality and is consistent over the whole set of images. Experimental results indicate significant improvement over state-of-the-art in both the accuracy and the consistency of 3D face reconstruction. Moreover, face recognition results on IJB-A with the UR2D face recognition pipeline indicate that, compared to single-view 3D face reconstruction, the proposed multi-view 3D face reconstruction algorithm can improve the face identification accuracy of UR2D by two percentage points in Rank-1 identification rate.
KW - 3D face reconstruction
KW - Face recognition
KW - Long-short term memory
KW - Recurrent neural network
UR - http://www.scopus.com/inward/record.url?scp=85055215226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055215226&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2018.09.004
DO - 10.1016/j.imavis.2018.09.004
M3 - Article
AN - SCOPUS:85055215226
VL - 80
SP - 80
EP - 91
JO - Image and Vision Computing
JF - Image and Vision Computing
SN - 0262-8856
ER -