Revolutionizing Silent Speech Interface Using Crystalline-silicon-based Strain Gauges and Deep Learning Algorithms
Scientists from Korea develop a device that can help with non-verbal communication
Research published online in Nature Communications in October 2022
Speech impediment affects nearly 360 million people around the world. Recently, a group of researchers from Korea developed a novel silent speech interface using a strain sensor combined with a 3D convolutional neural network, capable of classifying words with an accuracy of 87.53%.
Image courtesy: Shutterstock
According to WHO estimates, more than 5% of the world’s population have hearing and speech impairments. Many solutions have been researched to address this issue of verbal communication without vocalization. Silent speech recognition is a method that can be used to address speech impediments. It tracks facial movements by visual monitoring, which has a high spatial resolution. However, silent speech recognition can only be used in static environments when there is plenty of light on the subject.
In dynamic environments, human-machine interfaces perform much better. In particular, surface electromyography (sEMG), which measures electrical activity from facial muscles, is non-invasive and less complex. However, it has limited scalability due to various signal quality issues.
In recent years, facial strain mapping using epidermal sensors has emerged as a wearable silent speech interface (SSI). A group of scientists from Korea has now developed a novel SSI using a strain sensor based on single-crystalline silicon. This sensor uses a 3D convolutional neural network to combat the challenges of the existing SSI. This research, led by Professor Ki Jun Yu from Yonsei University in Korea, was published in Nature Communications and made available online on October 3, 2022.
The proposed epidermal strain sensor was fabricated with an ultrathin mesh and a serpentine structure without the need for any additional elastomeric layers. This provides enhanced air and sweat permeability, making it very wearable. The fabricated device was less than 8 µm in thickness. Two perpendicularly mounted strain gauges with a small cell dimension of 0.1 mm2 could reliably collect biaxial strain information.
To obtain data for training the proposed 3D convolutional network, the scientists affixed four strain sensors around the subject’s mouth. They collected a large dataset of 100 words from two different subjects. The proposed SSI classified the words with an accuracy of 87.53%, which was higher compared to existing SSI systems. In addition to the model, the FEA simulation and automatic stretching test results showed that the placement of the four sensors is suitable for measuring the 2D movement of the skin.
To further test the proposed system’s reliability, the team manufactured an SSI using sEMG electrodes. They found that this SSI showed a low accuracy of 42.60%, highlighting the reliability of the proposed device.
"We have devised a system that uses a sensor with dimensions hundreds of times smaller than that of EMG-based SSI, making the proposed system highly scalable while maintaining high accuracy. The greatest significance of our study is that it suggests a potential for extended speech recognition through highly integrated facial mapping," concludes Prof. Yu.
This system can improve the quality of life for those who struggle to communicate verbally.
Professor Jong-Hyun Ahn
Professor Seong Chan Jun
Professor Donghyun Kim