LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/230998

DC Field	Value
dc.title	LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING
dc.contributor.author	TAN MIN RONG SAMSON
dc.date.accessioned	2022-09-08T18:00:25Z
dc.date.available	2022-09-08T18:00:25Z
dc.date.issued	2022-03-03
dc.identifier.citation	TAN MIN RONG SAMSON (2022-03-03). LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING. ScholarBank@NUS Repository.
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/230998
dc.description.abstract	Language is a largely social construct, shaped by each community's lived experiences, culture, and language repertoire. However, current natural language processing (NLP) systems fail to account for sociolinguistic variation: common NLP practices implicitly assume that all speakers of a language speak a single, "standard" version. This is damaging to minority language varieties, perpetuating the perception of being "ungrammatical" and "incorrect". Failing to address this gap predisposes NLP systems to discriminate against minority language communities. This can take the form of disproportionately poor performance or encoding harmful stereotypes. Hence, this thesis focuses on the issues surrounding sociolinguistic generalization, defined as an NLP system's ability to generalize beyond the language variety it was trained on. In some situations, this can be viewed as the ability to be robust to sociolinguistic variation. Using adversarial attacks, we reveal the linguistic biases of existing NLP models and design methods to mitigate them. We conclude by generalizing the prior adversarial attacks into a framework for testing NLP system reliability in the presence of language variation. Language technology is often hailed as an avenue of improving technological accessibility. This thesis strives for a world in which NLP not only works for the privileged, but for everyone.
dc.language.iso	en
dc.subject	adversarial, robustness, natural language processing, machine learning, sociolinguistic variation, reliability
dc.type	Thesis
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.supervisor	Min-Yen Kan
dc.description.degree	Ph.D
dc.description.degreeconferred	DOCTOR OF PHILOSOPHY (SOC)
dc.identifier.orcid	0000-0003-1019-8228
Appears in Collections:	Ph.D Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
samsontmr.pdf		15.58 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.