# Psynd Dataset Partial Synthetic Detection (Psynd) dataset is a multi-speaker English corpus of 2294 utterances, approximately 13 hours English speech at 24kHz sampling rate. It is derived from LibriTTS , a read English speech corpus (all real voices) designed for TTS research. The data samples are real utterances injected with voice cloning synthetic speech. The fake parts are generated by state-of-art multi-speaker text-to-speech method and have high similarity with target speakers characterized by Global Style Token (GST) and X-Vector. The subsets of Psynd are as follows: | The number of utterances | Female Speakers | Male speakers | Total Speakers | | :----------------------: | :-------------: | :-----------: | :------------: | |Train|1963|430|461| |Validation|94|20|20| |Test|79|19|19| |Special|107|19|18| |Degraded|158|19|19| The synthetic segments are generated by [Espnet2]([GitHub - idiap/acoustic-simulator: Implementation of audio degradation processes](https://github.com/idiap/acoustic-simulator)). The audios in Degraded subset are degraded to landline and cellular quality using [acoustic simulator]([GitHub - idiap/acoustic-simulator: Implementation of audio degradation processes](https://github.com/idiap/acoustic-simulator)). The localization information for different subsets are in the different excel files separately. The first column is file name, and the following columns are the label transformation point in seconds. Please cite the article if you use the dataset. Bowen Zhang and Terence Sim, Localizing Fake Segments in Speech, published International Conference on Pattern Recognition, Montreal, Canada, 2022.