Google’s AI tool for spotting online abuse can be deceived, researchers show
A group of researchers at the University of Washington’s Network Security Lab has shown that the artificial intelligence (AI) tool developed by Google’s Jigsaw to spot online harassment and abuse can be deceived by ‘slightly perturbing the abusive phrases’. Jigsaw’s Perspective is focused on moderating online conversations to spot abusive and harassing comments. As explained by Arstechnica, the AI tool works by applying a ‘toxicity score’ to comments, which can then be used to aid moderation or reject comments. The researches have shown that the tool can be deceived to give low toxicity scores to comments that it would otherwise flag, by simply misspelling words or inserting punctuation into the word.