Researchers present method to reduce bias in AI data set
Researchers at the Princeton’s University Engineering School have developed a new approach to obtain fairer data sets containing images of people, and thus address challenges related to bias in artificial intelligence (AI). They propose improvements to ImageNet, a database that contains more than 14 million images (objects, landscapes, and people) and has been used extensively to train data for machine learning algorithms that classify images or recognise elements within them. The researchers started the work by identifying non-visual concepts and offensive categories, such as racial and sexual characterisations, among ImageNet’s person categories, and removing them from the database. They concluded that ImageNet’s content reflects considerable bias; people annotated as dark-skinned, females, and adults over 40 were underrepresented across most categories of images. Based on these findings, the researchers designed a web-interface tool that allows users to obtain a set of images that are demographically balanced (by age, gender expression, or skin colour) in a way the user specifies. For example, the full collection of images in the category ‘programmer’ may include about 90% males and 10% females, while in the USA about 20% of computer programmers are female; a researcher could use the new tool to retrieve a set of programmer images representing 80% males and 20% females. The overall goal of this tool is to facilitate the development of AI algorithms that are fairer in classifying people’s faces and activities in images.