As of January 2023, there were 167 million Indonesian social media users in a country with over 600 local languages dispersed across over 13,000 islands. With a wide variety of languages and dialects, social media platforms face huge challenges in moderating illegal and harmful content in these local languages.
A survey of 1,500 social media users from 38 provinces in Indonesia by PR2Media (2023) showed that the type of illegal content most frequently encountered by respondents was hate speech (67%), followed by disinformation (66%), digital scams (60%), defamation (43%), and pornography (40%).
Against the backdrop, this project aims to identify words and phrases in local languages in three provinces, namely West Java Province, Central Kalimantan Province, and Papua Province, that social media users commonly use to convey ethnic and religious-based hate speech in their area. It is hoped that, as a result, social platforms will be better able to deal with hate speech content in those languages.