Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/230998
DC Field | Value | |
---|---|---|
dc.title | LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING | |
dc.contributor.author | TAN MIN RONG SAMSON | |
dc.date.accessioned | 2022-09-08T18:00:25Z | |
dc.date.available | 2022-09-08T18:00:25Z | |
dc.date.issued | 2022-03-03 | |
dc.identifier.citation | TAN MIN RONG SAMSON (2022-03-03). LINGUISTICALLY-INCLUSIVE NATURAL LANGUAGE PROCESSING. ScholarBank@NUS Repository. | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/230998 | |
dc.description.abstract | Language is a largely social construct, shaped by each community's lived experiences, culture, and language repertoire. However, current natural language processing (NLP) systems fail to account for sociolinguistic variation: common NLP practices implicitly assume that all speakers of a language speak a single, "standard" version. This is damaging to minority language varieties, perpetuating the perception of being "ungrammatical" and "incorrect". Failing to address this gap predisposes NLP systems to discriminate against minority language communities. This can take the form of disproportionately poor performance or encoding harmful stereotypes. Hence, this thesis focuses on the issues surrounding sociolinguistic generalization, defined as an NLP system's ability to generalize beyond the language variety it was trained on. In some situations, this can be viewed as the ability to be robust to sociolinguistic variation. Using adversarial attacks, we reveal the linguistic biases of existing NLP models and design methods to mitigate them. We conclude by generalizing the prior adversarial attacks into a framework for testing NLP system reliability in the presence of language variation. Language technology is often hailed as an avenue of improving technological accessibility. This thesis strives for a world in which NLP not only works for the privileged, but for everyone. | |
dc.language.iso | en | |
dc.subject | adversarial, robustness, natural language processing, machine learning, sociolinguistic variation, reliability | |
dc.type | Thesis | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.contributor.supervisor | Min-Yen Kan | |
dc.description.degree | Ph.D | |
dc.description.degreeconferred | DOCTOR OF PHILOSOPHY (SOC) | |
dc.identifier.orcid | 0000-0003-1019-8228 | |
Appears in Collections: | Ph.D Theses (Open) |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
samsontmr.pdf | 15.58 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.