The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

Lin Zhang; Xin Wang; Erica Cooper; Nicholas Evans; Junichi Yamagishi

doi:10.1109/TASLP.2022.3233236

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2023

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

(1) , (1) , (1) , (2) , (1)

1
2

Lin Zhang

Fonction : Auteur
PersonId : 1211516
ORCID : 0000-0001-7826-2850

National Institute of Informatics

Xin Wang

Fonction : Auteur
PersonId : 1211517
ORCID : 0000-0001-8246-0606

National Institute of Informatics

Erica Cooper

Fonction : Auteur
PersonId : 1211518
ORCID : 0000-0002-2978-2793

National Institute of Informatics

Nicholas Evans

Fonction : Auteur
PersonId : 1211519
ORCID : 0000-0002-8459-1041

Eurecom [Sophia Antipolis]

Junichi Yamagishi

Fonction : Auteur
PersonId : 1211520
ORCID : 0000-0003-2752-3955

National Institute of Informatics

Résumé

Automatic speaker verification is susceptible to various manipulations and spoofing, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, and so on. We consider a new spoofing scenario called “Partial Spoof” (PS) in which synthesized or transformed speech segments are embedded into a bona fide utterance. While existing countermeasures (CMs) can detect fully spoofed utterances, there is a need for their adaptation or extension to the PS scenario. We propose various improvements to construct a significantly more accurate CM that can detect and locate short-generated spoofed speech segments at finer temporal resolutions. First, we introduce newly developed self-supervised pre-trained models as enhanced feature extractors. Second, we extend our PartialSpoof database by adding segment labels for various temporal resolutions. Since the short spoofed speech segments to be embedded by attackers are of variable length, six different temporal resolutions are considered, ranging from as short as 20 ms to as large as 640 ms. Third, we propose a new CM that enables the simultaneous use of the segment-level labels at different temporal resolutions as well as utterance-level labels to execute utterance- and segmentlevel detection at the same time. We also show that the proposed CM is capable of detecting spoofing at the utterance level with low error rates in the PS scenario as well as in a related logical access (LA) scenario. The equal error rates of utterance-level detection on the PartialSpoof database and ASVspoof 2019 LA database were 0.77 and 0.90%, respectively.

Mots clés

Anti-spoofing deepfake PartialSpoof selfsupervised learning spoof localization countermeasure

Domaines

Informatique [cs] Sciences de l'ingénieur [physics]

Centre De Documentation Eurecom : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03922020

Soumis le : mercredi 4 janvier 2023-11:03:49

Dernière modification le : jeudi 26 janvier 2023-14:16:11

Dates et versions

hal-03922020 , version 1 (04-01-2023)

Identifiants

HAL Id : hal-03922020 , version 1
DOI : 10.1109/TASLP.2022.3233236

Citer

Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi. The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. IEEE/ACM Transactions on Audio, Speech and Language Processing, In press, pp.1-13. ⟨10.1109/TASLP.2022.3233236⟩. ⟨hal-03922020⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EURECOM ANR

30 Consultations

0 Téléchargements

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager