Automatic Speech Segmentation of French: Corpus Adaptation. - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Automatic Speech Segmentation of French: Corpus Adaptation.

Brigitte Bigi

Résumé

Whereas it was common some years ago to formulate phonetic models on the basis of rather limited data, today it is becoming more and more expected for linguists to take into account large quantities of empirical data, often including several hours of recorded speech. The analysis of the phonetic entities of speech nearly always requires the alignment of the speech recording with a phonetic transcription of the speech. This task is extremely labour-intensive. It is consequently obvious that transcribing and aligning several hours of speech by hand is not generally something which can be envisaged. A number of tool boxes are currently available which can be used to automate the task, including the HTK Toolkit, Sphinx, or Julius. Numerous studies have been carried out in prepared speech, as for example for broadcast news. However, conversational speech refers to an activity more informal, without any preparation. As a consequence, numerous phenomena appear such as hesitations, repeats, feedback, backchannels, etc. Other phonetic phenomena such as non-standard elision, reduction phenomena, truncated words, and more generally, non-standard pronunciations are also very frequent. We then propose to evaluate the impact of the speech style on the speech segmentation task by comparing controlled speech with spontaneous speech. The segmentation task is performed on French with SPPAS, a tool to produce automatic annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is an open source software issued under the GNU Public License. SPPAS is also designed to be used directly by linguists.
Fichier non déposé

Dates et versions

hal-01500720 , version 1 (03-04-2017)

Identifiants

  • HAL Id : hal-01500720 , version 1

Citer

Brigitte Bigi. Automatic Speech Segmentation of French: Corpus Adaptation.. Second Asia Pacific Corpus Linguistics Conference, Mar 2014, Hong Kong, France. pp.32-32. ⟨hal-01500720⟩
100 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More