Please use this identifier to cite or link to this item:
https://doi.org/10.25540/4VMK-AYPV
Title: | Localizing Fake Segments in Speech | Creators: | SIM MONG CHENG, TERENCE BOWEN ZHANG |
NUS Contact: | Terence, Mong Cheng Sim | Subject: | Computer Science | DOI: | doi:10.25540/4VMK-AYPV | Description: | Partial Synthetic Detection (Psynd) dataset is a multi-speaker English corpus of 2294 utterances, approximately 13 hours English speech at 24kHz sampling rate. It is derived from LibriTTS , a read English speech corpus (all real voices) designed for TTS research. The data samples are real utterances injected with voice cloning synthetic speech. The fake parts are generated by state-of-art multi-speaker text-to-speech method and have high similarity with target speakers characterized by Global Style Token (GST) and X-Vector. |
Related Publications: | Localizing Fake Segments in Speech | Citation: | When using this data, please cite the original publication and also the dataset.
|
License: | CC0 1.0 Universal http://creativecommons.org/publicdomain/zero/1.0/ |
Appears in Collections: | Other Dataset |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | |
---|---|---|---|---|---|
Psynd.zip | Dataset file | 1.77 GB | ZIP | CLOSED | Request a copy |
Psynd_readme.txt | Readme file | 1.65 kB | Text | OPEN | View/Download |
This item is licensed under a Creative Commons License