Sign language recognition

Abstract

☝ Subunit modeling for sign language recognition

This research was mainly carried out by Mr. Ariga, who completed his Master's degree in 2009. The vocabulary of a sign language is mainly represented by manual signals. Methods to extract the features of these hand movements from images or sensors and to recognise and classify them using some kind of pattern recognition technology have been widely studied. In this study, a recognition model tailored to the nature of hand signals was investigated, taking into account that hand signals are mainly composed of hand movements, hand positions and hand shapes. It was shown that extending the Hidden Markov Model to handle such features separately, while exploiting the commonalities that appear in the word representation, can improve the recognition performance of sign words.

☝ Depth sensor-based sign language recognition system

This research was mainly carried out by Ms Hatano, who completed her Master's degree in 2015. Sign language, a visual language, contains various lexical and grammatical expressions in three-dimensional body movements. With the advent of depth sensors (e.g. Kinect sensor), it is now possible to acquire depth information at high speed and with high accuracy without using image processing. In this study, a real-time sign language word recognition technique was proposed using Kinect version 2, a typical depth sensor. The results of this research were also used to develop a continuous sign language recognition system using a small kiosk terminal in cooperation with a private company under a project supported by the Ministry of Economy, Trade and Industry.

☝ Fingerspelling recognition

The study was carried out by Hosoe, who completed her master's degree in 2017, and Nam, who will complete her master's degree in 2019. In Japanese Sign Language, different vocabulary is mainly represented by hand movements and shapes, but there are cases where there are no specific representations for proper nouns, such as names of people or places, and in such cases Japanese Sign Language uses unique finger shapes (fingerspelling) that correspond to each hiragana character in Japanese. This is also the case in other sign languages that use fingerspelling to represent the phonograms of the major spoken languages.

As part of research into automatic sign language recognition, fingerspelling recognition from moving images has been widely studied. Although each letter has a specific shape, fingerprints vary from person to person in the shape of the fingers, the way they are presented, and the direction from which they are captured. Our aim was to improve the performance of fingerprint recognition by using a 3D model to generate data that reproduces various shape changes and differences depending on the viewpoint, without having to collect many different types of data. It should be noted that we collaborated with Prof. Bogdan Kwolek of AGH University of Science and Technology, Poland.

☝ Sign language recognition using ego-centric video

This research was mainly carried out by Mr Miura, who graduated in 2022. This research is unique in that it uses video to read sign language from the perspective of the person signing (first-person video). Previous automatic sign language recognition technology using video images has mostly used video data of a person speaking a sign language face to face. However, as sign language is a visual language that uses all of 3D-space, information about the signer's own point of view is also essential. For example, the object beyond the pointing is necessary to understand the meaning of a sign, and traditional video data of the person itself lacks information such as the person, the object, or the direction beyond the pointing, which prevents full interpretation.

In this study, we investigated a technology to simultaneously track not only the video data from the signer's point of view, but also the signer's body movements by using an omnidirectional camera (360° camera) from the signer's point of view, and examined whether the body movement information obtained in this way could be useful for automatic sign language recognition. We also investigated whether the body movement information obtained in this way could be used for automatic sign language recognition.

Publications

Mika Hatano, Shinji Sako, and Tadashi Kitamura, "Contour-based Hand Pose Recognition for Sign Language Recognition", Proc. of 6^th Workshop on Speech and Language Processing for Assistive Technologies, Sep. 2015. [PDF]
Bogdan Kwolek, and Shinji Sako, "Learning Siamese Features for Finger Spelling Recognition", Advanced Concepts for Intelligent Vision Systems, LNCS, Vol. 10617, pp.225–236, Sep. 2017. [DOI]
Nam Tu Nguyen, Shinji Ssako and Bogdan Kwolek, "Deep CNN-based Recognition of JSL Finger Spelling", International Conference on Hybrid Artificial Intelligent Systems (HAIS), Lecture Notes in Computer Science book series (LNCS), Vol. 11734, pp.602–613, Sep. 2019. [DOI]
Nguyen Tu Nam, Shinji Sako, Bogdan Kwolek, "Fingerspelling recognition using synthetic images and deep transfer learning", 2020 The 13th International Conference on Machine Vision (ICMV 2020), 11605, pp. 528–535, Nov. 2020. [DOI]
Teppei Miura, and Shinji Sako, "SynSLaG: Synthetic Sign Language Generator", The 23^rd International ACM SIGACCESS Conference on Computers and Accessibility, pp.1–4, Oct. 2021.[DOI]
Teppei Miura, Shinji Sako, "3D Ego-Pose Lift-Up Robustness Study for Fisheye Camera Perturbations", 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Vol. 4: pp. 600–606, Feb. 2023. [DOI]