Incubate publications
Type
Sumaya Ahmed Salihs , Isaac Wiafe ,Jamal-Deen Abdulai1 ,Elikem Doe Atsakpo ,Gifty Ayoka3, Richard Cave, Akon Obu Ekpezu, Catherine Holloway, Katrin Tomanek, Fiifi Baffoe Payin,Winful1
Automatic Speech Recognition (ASR) technology has transformed human-human and human-computer communications. It facilitates understanding through real-time speech captioning [1], [2] and supports hands-free computing (e.g. email dictations, emails, online information retrieval, and automatic language translation). ASR is used to control smart home activities, such as changing television channels, heating, ventilation, air conditioning, and adjusting lighting. Although it continues to be useful, most of these technologies do not cater to speech diversity and are often optimized for ‘standard’ or typical speech. Therefore, they fail to benefit individuals with impaired speech such as dysarthria, stammering, or cleft palate who often experience reduced ASR accuracy. Prior studies have demonstrated the potential benefits of speech recognition technologies in English for distinct impaired speech[2], [3], [4].
While this benefits English speakers, it is imperative to extend similar technologies to low-resource languages (LRLs). LRL communities have limited access to assistive technologies and speech and language therapy (SLT) services [5], [6], [7]. Hence, the availability of ASR technologies in LRLs will facilitate effective communication for those with speech impairments, especially in sub-Saharan Africa, where there are insufficient speech therapy resources [7]. This study is part of a larger initiative that seeks to collect, validate, and create a large corpus of impaired speech in LRLs. It reports the findings of a pilot study in the Akan language from Ghana, by discussing the methods, challenges, and lessons learned from the data collection, validation, and testing of the dataset to adapt ASR models.