Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification
Résumé
We introduce a prototype system for modifying an arbitrary parameter of a speech signal. Unlike signal processing approaches that require dedicated methods for different parameters, our system can-in principle-modify any control parameter that the signal can be annotated with. Our system comprises three neural networks. The 'hider' removes all information related to the control parameter, outputting a hidden embedding. The 'finder' is an adversary used to train the 'hider', attempting to detect the value of the control parameter from the hidden embedding. The 'combiner' network recombines the hidden embedding with a desired new value of the control parameter. The input and output to the system are mel-spectrograms and we employ a neural vocoder to generate the output speech waveform. As a proof of concept, we use F0 as the control parameter. The system was evaluated in terms of control parameter accuracy and naturalness against a high quality signal processing method of F0 modification that also works in the spectrogram domain. We also show that, with modifications only to training data, the system is capable of modifying the 1 st and 2 nd vocal tract for-mants, showing progress towards universal signal modification.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...