A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 9
Sep.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
C. Zhu, J. Y. Yang, Z. P. Shao, and C. P. Liu, "Vision Based Hand Gesture Recognition Using 3D Shape Context," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1600-1613, Sep. 2021. doi: 10.1109/JAS.2019.1911534
Citation: C. Zhu, J. Y. Yang, Z. P. Shao, and C. P. Liu, "Vision Based Hand Gesture Recognition Using 3D Shape Context," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1600-1613, Sep. 2021. doi: 10.1109/JAS.2019.1911534

Vision Based Hand Gesture Recognition Using 3D Shape Context

doi: 10.1109/JAS.2019.1911534
Funds:  This work was supported by the National Natural Science Foundation of China (61773272, 61976191), the Six Talent Peaks Project of Jiangsu Province, China (XYDXX-053), and Suzhou Research Project of Technical Innovation, Jiangsu, China (SYG201711)
More Information
  • Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient. The representation of hand gestures is critical for recognition. In this paper, we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition. The depth maps of hand gestures captured via the Kinect sensors are used in our method, where the 3D hand shapes can be segmented from the cluttered backgrounds. To extract the pattern of salient 3D shape features, we propose a new descriptor–3D Shape Context, for 3D hand gesture representation. The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition. The description of all the 3D points constructs the hand gesture representation, and hand gesture recognition is explored via dynamic time warping algorithm. Extensive experiments are conducted on multiple benchmark datasets. The experimental results verify that the proposed method is robust to noise, articulated variations, and rigid transformations. Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.

     

  • loading
  • [1]
    A. Memo and P. Zanuttigh, “Head-mounted gesture controlled interface for human-computer interaction,” Multimed. Tools Appl., vol. 77, no. 1, pp. 27–53, Dec. 2018. doi: 10.1007/s11042-016-4223-3
    [2]
    A. Haria, A. Subramanian, N. Asokkumar, S. Poddar, and J. S. Nayak, “Hand gesture recognition for human computer interaction,” Proc. Comput. Sci., vol. 115, pp. 367–374, Dec. 2017. doi: 10.1016/j.procs.2017.09.092
    [3]
    R. R. Itkarkar and A. V. Nandi, “A survey of 2D and 3D imaging used in hand gesture recognition for human-computer interaction (HCI),” in Proc. IEEE Int. WIE Conf. Electrical and Computer Engineering, Pune, India, 2017, pp. 188−193.
    [4]
    B. K. Chakraborty, D. Sarma, M. K. Bhuyan, and K. F. MacDorman, “Review of constraints on vision-based gesture recognition for human-computer interaction,” IET Comput. Vis., vol. 12, no. 1, pp. 3–15, Feb. 2018. doi: 10.1049/iet-cvi.2017.0052
    [5]
    J. P. Wachs, M. Kölsch, H. Stern, and Y. Edan, “Vision-based hand-gesture applications,” Commun. ACM, vol. 54, no. 2, pp. 60–71, Feb. 2011. doi: 10.1145/1897816.1897838
    [6]
    H. F. Lv, “Research on the static hand gesture recognition base on convolutional neural network,” Mod. Comput., vol. 23, no. 27, pp. 44–46, 2018.
    [7]
    M. K. Bhuyan, D. Ghosh, and P. K. Bora, “Feature extraction from 2D gesture trajectory in dynamic hand gesture recognition,” in Proc. IEEE Conf. Cybernetics and Intelligent Systems, Bangkok, Thailand, 2006, pp. 1−6.
    [8]
    J. Pansare and M. Ingle, “2D hand gesture for numeric devnagari sign language analyzer based on two cameras,” in Proc. 8th Int. Conf. Intelligent Human Computer Interaction, Pilani, India, 2016, pp. 148−160.
    [9]
    P. X. Li, D. Wang, L. J. Wang, and H. C. Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognit., vol. 76, pp. 323–338, Apr. 2018. doi: 10.1016/j.patcog.2017.11.007
    [10]
    H. X. Yang, L. Shao, F. Zheng, L. Wang, and Z. Song, “Recent advances and trends in visual tracking: A review,” Neurocomputing, vol. 74, no. 18, pp. 3823–3831, Nov. 2011. doi: 10.1016/j.neucom.2011.07.024
    [11]
    J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, and R. Moore, “Real-time human pose recognition in parts from single depth images,” Commun. ACM, vol. 56, no. 1, pp. 116–124, Jan. 2013. doi: 10.1145/2398356.2398381
    [12]
    B. Y. Xu, Z. H. Zhou, J. C. Huang, and Y. Huang, “Static hand gesture recognition based on RGB-D image and arm removal,” in Proc. 14th Int. Symp. Neural Networks, Muroran, Hokkaido, Japan, 2017, pp. 180−187.
    [13]
    F. Wen, C. Q. Kang, L. W. Chen, H. Ding, K. Xu, and N. N. Wang, “Static hand gesture recognition based on RGBD data,” Comput. Moderniz., vol. 1, pp. 74–77, Jan. 2018.
    [14]
    P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition with 3D convolutional neural networks,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 1−7.
    [15]
    Y. W. He, J. Y. Yang, Z. P. Shao, and Y. F. Li, “Salient feature point selection for real time RGB-D hand gesture recognition,” in Proc. IEEE Int. Conf. Real-Time Computing and Robotics, Okinawa, Japan, 2017, pp. 103−108.
    [16]
    G. Marin, F. Dominio, and P. Zanuttigh, “Hand gesture recognition with jointly calibrated leap motion and depth sensor,” Multimed. Tools Appl., vol. 75, no. 22, pp. 14991–15015, 2016. doi: 10.1007/s11042-015-2451-6
    [17]
    C. Dong, M. C. Leu, and Z. Z. Yin, “American sign language alphabet recognition using microsoft kinect,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 44−52.
    [18]
    C. Wang, Z. Liu, and S. C. Chan, “Superpixel-based hand gesture recognition with kinect depth camera,” IEEE Trans. Multimed., vol. 17, no. 1, pp. 29–39, Jan. 2015. doi: 10.1109/TMM.2014.2374357
    [19]
    G. M. Zhu, L. Zhang, L. Mei, J. Shao, J. Song, and P. Y. Shen, “Large-scale isolated gesture recognition using pyramidal 3D convolutional networks,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 19−24.
    [20]
    P. C. Wang, W. Q. Li, S. Liu, Z. M. Gao, C. Tang, and P. Ogunbona, “Large-scale isolated gesture recognition using convolutional neural networks,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 7−12.
    [21]
    Y. N. Li, Q. G. Miao, K. Tian, Y. Y. Fan, X. Xu, R. Li, and J. F. Song, “Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 25−30.
    [22]
    Z. Ren, J. S. Yuan, J. J. Meng, and Z. Y. Zhang, “Robust part-based hand gesture recognition using kinect sensor,” IEEE Trans. Multimed., vol. 15, no. 5, pp. 1110–1120, Aug. 2013. doi: 10.1109/TMM.2013.2246148
    [23]
    M. Elmezain, A. Al-Hamadi, J. Appenrodt, and B. Michaelis, “A hidden Markov model-based continuous gesture recognition system for hand motion trajectory,” in Proc. 19th Int. Conf. Pattern Recognition, Tampa, FL, USA, 2008, pp. 1−4.
    [24]
    J. Y. Yang, J. S. Yuan, and Y. F. Li, “Parsing 3D motion trajectory for gesture recognition,” J. Vis. Commun. Image Represent., vol. 38, pp. 627–640, Jul. 2016. doi: 10.1016/j.jvcir.2016.04.010
    [25]
    H. Tang, W. Wang, D. Xu, Y. Yan, and N. Sebe, “GestureGAN for hand gesture-to-gesture translation in the wild,” in Proc. 26th ACM Int. Conf. Multimedia, Seoul, Republic of Korea, 2018, pp. 774−782.
    [26]
    W. Z. Nai, Y. Liu, D. Rempel, and Y. T. Wang, “Fast hand posture classification using depth features extracted from random line segments,” Pattern Recognit., vol. 65, pp. 1–10, May 2017. doi: 10.1016/j.patcog.2016.11.022
    [27]
    C. Y. Zhang and Y. L. Tian, “Histogram of 3D facets: A depth descriptor for human action and hand gesture recognition,” Comput. Vis. Image Underst., vol. 139, pp. 29–39, Oct. 2015. doi: 10.1016/j.cviu.2015.05.010
    [28]
    Y. L. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. W. Wan, “3D object recognition in cluttered scenes with local surface features: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2270–2287, Nov. 2014. doi: 10.1109/TPAMI.2014.2316828
    [29]
    N. Bayramoglu and A. A. Alatan, “Shape index SIFT: Range image recognition using local features,” in Proc. 20th Int. Conf. Pattern Recognition, Istanbul, Turkey, 2010, pp. 352−355.
    [30]
    J. Wan, G. D. Guo, and S. Z. Li, “Explore efficient local features from RGB-D data for one-shot learning gesture recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1626–1639, Aug. 2016. doi: 10.1109/TPAMI.2015.2513479
    [31]
    A. I. Maqueda, C. R. del-Blanco, F. Jaureguizar, and N. García, “Human-computer interaction based on visual hand-gesture recognition using volumetric spatiograms of local binary patterns,” Comput. Vis. Image Underst., vol. 141, pp. 126–137, Dec. 2015. doi: 10.1016/j.cviu.2015.07.009
    [32]
    S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. doi: 10.1109/34.993558
    [33]
    L. Rabiner and B. Juang, Fundamentals of Speech Recognition. Beijing, China: Tsinghua University Press, 1999.
    [34]
    A. Satorra and P. M. Bentler, “A scaled difference chi-square test statistic for moment structure analysis,” Psychometrika, vol. 66, no. 4, pp. 507–514, Dec. 2001. doi: 10.1007/BF02296192
    [35]
    G. Marin, F. Dominio, and P. Zanuttigh, “Hand gesture recognition with leap motion and kinect devices,” in Proc. IEEE Int. Conf. Image Processing, Paris, France, 2014, pp. 1565−1569.
    [36]
    A. Memo, L. Minto, and P. Zanuttigh, “Exploiting silhouette descriptors and synthetic data for hand gesture recognition,” in Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference, A. Giachetti, S. Biasotti, and M. Tarini, Eds. The Eurographics Association, 2015, pp. 15−23.
    [37]
    N. Pugeault and R. Bowden, “Spelling it out: Real-time ASL fingerspelling recognition,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Barcelona, Spain, 2011, pp. 1114−1119.
    [38]
    J. Wan, S. Z. Li, Y. B. Zhao, S. Zhou, I. Guyon, and S. Escalera, “ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 2016, pp. 761−769.
    [39]
    J. Suarez and R. R. Murphy, “Hand gesture recognition with depth images: A review,” in Proc. 21st IEEE Int. Symp. Robot and Human Interactive Communication, Paris, France, 2012, pp. 411−417.
    [40]
    H. Cheng, L. Yang, and Z. C. Liu, “Survey on 3D hand gesture recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 9, pp. 1659–1673, Sept. 2016. doi: 10.1109/TCSVT.2015.2469551
    [41]
    C. Keskin, F. Kımathraç, Y. E. Kara, and L. Akarun, “Randomized decision forests for static and dynamic hand shape classification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 31−36.
    [42]
    C. Y. Zhang, X. D. Yang, and Y. L. Tian, “Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition,” in Proc. 10th IEEE Int. Conf. Workshops on Automatic Face and Gesture Recognition, Shanghai, China, 2013, pp. 1−8.
    [43]
    J. Y. Yang, C. Zhu, and J. S. Yuan, “Real time hand gesture recognition via finger-emphasized multi-scale description,” in Proc. IEEE Int. Conf. Multimedia and Expo, Hong Kong, China, 2017, pp. 631−636.
    [44]
    J. Cheng, C. Xie, W. Bian, and D. C. Tao, “Feature fusion for 3D hand gesture recognition by learning a shared hidden space,” Pattern Recognit. Lett., vol. 33, no. 4, pp. 476–484, Mar. 2012. doi: 10.1016/j.patrec.2010.12.009
    [45]
    A. D. Wilson and A. F. Bobick, “Parametric hidden Markov models for gesture recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 9, pp. 884–900, Sept. 1999. doi: 10.1109/34.790429
    [46]
    J. J. LaViola, “An introduction to 3D gestural interfaces,” in Proc. ACM SIGGRAPH Courses on - SIGGRAPH’14, Vancouver, Canada, 2014, pp. 1−42.
    [47]
    H. B. Pang and Y. D. Ding, “Dynamic hand gesture recognition using kinematic features based on hidden Markov model,” Lecture Notes in Electrical Engineering, vol. 227, pp. 255−262, 2013.
    [48]
    C. Keskin, F. Kımathraç, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using depth sensors,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Barcelona, Spain, 2011, pp. 1228−1234.
    [49]
    T. Arici, S. Celebi, A. S. Aydin, and T. T. Temiz, “Robust gesture recognition using feature pre-processing and weighted dynamic time warping,” Multimed. Tools Appl., vol. 72, no. 3, pp. 3045–3062, Jul. 2014. doi: 10.1007/s11042-013-1591-9
    [50]
    C. Keskin, A. T. Cemgil, and L. Akarun, “DTW based clustering to improve hand gesture recognition,” in Proc. 2nd Int. Workshop on Human Behavior Understanding, Amsterdam, The Netherlands, 2011, pp. 72−81.
    [51]
    S. D. Wu and Y. F. Li, “Flexible signature descriptions for adaptive motion trajectory representation, perception and recognition,” Pattern Recognit., vol. 42, no. 1, pp. 194–214, Jan. 2009. doi: 10.1016/j.patcog.2008.06.023
    [52]
    J. Y. Yang, H. X. Wang, J. S. Yuan, Y. F. Li, and J. Y. Liu, “Invariant multi-scale descriptor for shape representation, matching and retrieval,” Comput. Vis. Image Underst., vol. 145, pp. 43–58, Apr. 2016. doi: 10.1016/j.cviu.2016.01.005
    [53]
    X. Bai and L. J. Latecki, “Path similarity skeleton graph matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 7, pp. 1282–1292, Jul. 2008. doi: 10.1109/TPAMI.2007.70769
    [54]
    H. B. Ling and D. W. Jacobs, “Shape classification using the inner-distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp. 286–299, Feb. 2007. doi: 10.1109/TPAMI.2007.41
    [55]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, Nevada, 2012, pp. 1097−1105.
    [56]
    L. Q. Ma, X. Jia, Q. R. Sun, B. Schiele, T. Tuytelaars, and L. Van Goo, “Pose guided person image generation,” in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 406−416.
    [57]
    Y. C. Yan, J. W. Xu, B. B. Ni, W. D. Zhang, and X. K. Yang, “Skeleton-aided articulated motion generation,” in Proc. 25th ACM Int. Conf. Multimedia, Mountain View, California, USA, 2017, pp. 199−207.
    [58]
    L. Q. Ma, Q. R. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz, “Disentangled person image generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 99−108.
    [59]
    A. Siarohin, E. Sangineto, S. Lathuilière, and N. Sebe, “Deformable GANs for pose-based human image generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3408−3416.
    [60]
    A. Kuznetsova, L. Leal-Taixé, and B. Rosenhahn, “Real-time sign language recognition using a consumer depth camera,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Sydney, NSW, Australia, 2013, pp. 83−90.
    [61]
    I. Guyon, V. Athitsos, P. Jangyodsuk, H. J. Escalante, and B. Hamner, “Results and analysis of the ChaLearn gesture challenge 2012,” in Proc. Int. Workshop on Depth Image Analysis and Applications, Tsukuba, Japan, 2012, pp. 186−204.
    [62]
    S. I. Kang, A. Roh, and H. Hong, “Using depth and skin color for hand gesture classification,” in Proc. IEEE Int. Conf. Consumer Electronics, Las Vegas, NV, USA, 2011, pp. 155−156.
    [63]
    S. Mo, S. H. Cheng, and X. F. Xing, “Hand gesture segmentation based on improved kalman filter and TSL skin color model,” in Proc. Int. Conf. Multimedia Technology, Hangzhou, China, 2011, pp. 3543−3546.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(18)  / Tables(9)

    Article Metrics

    Article views (2121) PDF downloads(125) Cited by()

    Highlights

    • A new shape descriptor 3D-SC is proposed to represent 3D hand gesture
    • Both local shape feature and global shape distribution are included in multi-scales
    • This method outperforms state-of-the-art methods in both accuracy and efficiency
    • The proposed method is efficient enough for real-time applications

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return