Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/230998
DC FieldValue
dc.titleLINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING
dc.contributor.authorTAN MIN RONG SAMSON
dc.date.accessioned2022-09-08T18:00:25Z
dc.date.available2022-09-08T18:00:25Z
dc.date.issued2022-03-03
dc.identifier.citationTAN MIN RONG SAMSON (2022-03-03). LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING. ScholarBank@NUS Repository.
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/230998
dc.description.abstractLanguage is a largely social construct, shaped by each community's lived experiences, culture, and language repertoire. However, current natural language processing (NLP) systems fail to account for sociolinguistic variation: common NLP practices implicitly assume that all speakers of a language speak a single, "standard" version. This is damaging to minority language varieties, perpetuating the perception of being "ungrammatical" and "incorrect". Failing to address this gap predisposes NLP systems to discriminate against minority language communities. This can take the form of disproportionately poor performance or encoding harmful stereotypes. Hence, this thesis focuses on the issues surrounding sociolinguistic generalization, defined as an NLP system's ability to generalize beyond the language variety it was trained on. In some situations, this can be viewed as the ability to be robust to sociolinguistic variation. Using adversarial attacks, we reveal the linguistic biases of existing NLP models and design methods to mitigate them. We conclude by generalizing the prior adversarial attacks into a framework for testing NLP system reliability in the presence of language variation. Language technology is often hailed as an avenue of improving technological accessibility. This thesis strives for a world in which NLP not only works for the privileged, but for everyone.
dc.language.isoen
dc.subjectadversarial, robustness, natural language processing, machine learning, sociolinguistic variation, reliability
dc.typeThesis
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorMin-Yen Kan
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY (SOC)
dc.identifier.orcid0000-0003-1019-8228
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
samsontmr.pdf15.58 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.