HOME > Research


Speech generation from hand motions based on space mapping

(April 2007 - March 2012)
Individuals with speaking disabilities, particularly people suffering from dysarthria, often use a TTS synthesizer for speech communication. Since users always have to type sound symbols and the synthesizer reads them out in a monotonous style, the use of the current synthesizers usually renders real-time operation and lively communication difficult. This is why dysarthric users often fail to control the flow of conversation.
In this study, we propose a novel speech generation framework, which makes use of hand gestures as input. People usually use tongue gesture transitions for speech generation but we developed a special glove, by wearing which, speech sounds are generated from hand gesture transitions. For development, GMM-based voice conversion techniques (mapping techniques) are applied to estimate a mapping function between a space of hand gestures and another space of speech sounds.
Experiments showed that the special glove can generate good Japanese vowel transitions and nasals with voluntary control of duration and articulation.

Figure 1: The five Japanese vowels - aiueo.


Improved F0 modeling and generation in voice conversion

(May - August 2011)
Voice Conversion is a technique, which modifies a source speakerfs speech to be perceived as if a target speaker had spoken it. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. Conventionally, this framework is used to convert the spectral part, however, the pitch part is converted differently. Because F0 contours are continuous in voiced regions but it is discrete in unvoiced regions, so they cannot be treated with one GMM. Many researchers linearly predict target speakerfs F0 using mean and variance of source speakerfs speech data. In our proposed method, Voicing Strength is introduced and two different F0 states are treated with the same GMM. Experiments demonstrated the effectiveness of the proposed method.


Objective Evaluation Method of Interventional Manipulation Skills

(August 2012 - October 2013)
Endovascular catheter intervention requires special instruments and manual dexterity when a patientfs anatomy is to be accessed via a small incision. Large number of catheters exist and new steerable catheters are under development. To evaluate the interventional manipulation skills of residents, an objective measurement method is desired. Various methods to evaluate these skills of residents have been developed. Most of them require human raters or are only based on elapsed time to accomplish certain tasks. We proposed an objective approach to evaluate interventional manipulation skills.
Our evaluation method of catheter manipulation skills is to focus on the difference between experienced residents and novice residents. First off, videos of catheter manipulation tasks are recorded. A feature vector is extracted from each frame of a video. They are used to train a Gaussian Mixture Model (GMM) for experienced subjects and a GMM for novice subjects, respectively. With those models, each frame of a new video of catheter manipulation is recognized as "experienced" or "novice" based on Maximum Likelihood criterion. The subjective evaluation demonstrated that our proposed method is able to predict if the residents is experienced or novice with more than 80% of accuracy.
Figure 2: The automatic catheter tip tracking.


The relationship between Auditory Vocabulary Size, Recognition Rate and Listening Comprehension Skills

(April 2009 - at present)
To evaluate studentsf English listening comprehension skills, an online application based on Network-based auditory English lexical processing (NAELP) has been under development. I have been participating in this project as a developer.