Speech recognition systems present racial disparities, study finds

Researchers at Stanford University and Georgetown University in America have found that automated speech recognition (ASR) systems present racial disparities. They conducted a study in which five ASR systems developed by Amazon, Apple, Google, IBM, and Microsoft transcribed structured interviews with 42 white speakers and 73 black speakers. The results showed that ‘all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers’. The researchers recommended that the speech recognition community invest resources into ensuring that systems and the training data they use are broadly inclusive. ‘Such an effort, we believe, should entail not only better collection of data on African American Vernacular English (AAVE) speech but also better collection of data on other nonstandard varieties of English, whose speakers may similarly be burdened by poor ASR performance—including those with regional and nonnative-English accents.’