On the Fusion of RGB and Depth Information for Hand Pose Estimation

Evangelos Kazakos, Christophoros Nikou, Ioannis A. Kakadiaris

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations


Recent advances in deep learning have spurred 3D hand pose estimation, as convolutional network (ConvNet) based methods outperformed random forests. However, in the state of the art, ConvNet based methods employ only depth images of the hand without leveraging color and texture information from the RGB domain. In this paper, we investigate whether ConvNets can learn more rich and discriminative em-beddings, by combining RGB and depth information. To answer this question, we propose the fusion of RGB and depth information in a double-stream architecture. More specifically, RGB and depth images are fed into two separate networks by extracting features, which are subsequently fused at an intermediate layer of the ConvNet, implementing input-level fusion, feature-level fusion and score-level fusion. The double-stream scheme is coupled with a deep ConvNet, contrary to the shallow networks that are mostly proposed in the literature. Experimental results show that while the depth of the network is crucial for hand pose estimation, the double-stream nets perform very similarly with the net trained only with depth images. This may suggest that training double-stream architectures purely with supervision may be insufficient for hand pose estimation with RGB-D fusion.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781479970612
StatePublished - Aug 29 2018
Event25th IEEE International Conference on Image Processing, ICIP 2018 - Athens, Greece
Duration: Oct 7 2018Oct 10 2018

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880


Conference25th IEEE International Conference on Image Processing, ICIP 2018


  • Deep learning
  • Double-stream networks
  • Fusion
  • Hand pose estimation
  • Rgb-d

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing


Dive into the research topics of 'On the Fusion of RGB and Depth Information for Hand Pose Estimation'. Together they form a unique fingerprint.

Cite this