Nee J, Smith GM, Sheares A, Rustagi I. Linguistic justice as a framework for designing, developing, and managing natural language processing tools. Big Data & Society. January 2022. doi:10.1177/20539517221090930
The increasing ubiquity of natural language processing (NLP) tools that learn from and use human language is undeniable. NLP-powered tools now produce records of court hearings, inform job interview analyses, respond to our verbal requests through smartphones, and more. However, NLP tools don’t serve all people equally: they often perform better for certain speakers and advance linguistic profiling. So, a critical question remains: how can these technologies equitably serve all members of society, regardless of their language background?
The concept of linguistic justice can be used to frame NLP tool development in a way that
centres the needs of all users, rather than prioritising speakers of privileged languages like “Standard” English. Linguistic justice is achieved when all individuals are granted equitable access to social, political, and economic life, regardless of their linguistic repertoire. Linguistic justice, then, requires that NLP tools serve diverse speakers and signers equitably.
Our commentary examines in detail two issues with current NLP tool development. First, if NLP tools learn from datasets that lack sufficient data from speakers of minoritised language varieties, those tools may underperform for those users. Secondly, NLP systems can use language to infer information about the identities of users - a process known as linguistic profiling. Even when protected information (e.g., race, gender) is not directly provided to an NLP system, the system may still infer a users’ identity from features of their language use. Inferred characteristics may then be used to mediate access to goods, services, and opportunities, resulting in unlawful discrimination.
We present nine specific actions that researchers, developers, and business leaders can take to design, develop, and manage NLP systems that advance linguistic justice. This includes, for example, working with diverse language communities in participatory and empowering ways, ensuring language data is labeled by people familiar with the particular language variety, and examining and altering power structures so that the needs and perspectives of those at the margins are prioritised.
Instead of being comfortable with the status quo, this work requires imagining and working towards a world where users of all language varieties are able to equitably access social, economic, and political life. It requires rethinking how we collect data and what data we value in NLP development. Our nine actions provide a path forward toward that world – whereby NLP systems can advance linguistic justice and thereby, social justice.