From ArchWiki
Jump to: navigation, search

Speech rate and sample rate

It is exceedingly difficult to find a native means for festival to alter pitch and rate of speech; their own documentation makes the absurd assertion that learning Scheme is no harder than the emacs the developers we are all apparently using. This unsourced, unexplained, Japanese how-to actually shows some specific variables to set, but I have not been able to prove that they work (the code demonstrated on the site does nothing for HTS voices at least).

Anyway, on to my point. One of the ways recommended to set the rate of speech is by adjusting the sample rate in Audio_Command. This is usually suggested for aplay; with paplay I had some trouble. It only works in RAW mode to begin with, the speed of speech is multiplied by the number of channels, and the quality of audio is divided by the number of channels. I suggest to use either a single channel such as 'center' (on surround) or 'mono' for festival output.

Example ~/.festival.rc:

(set! voice_default voice_nitech_us_clb_arctic_hts)
(Parameter.set 'Audio_Required_Format 'raw)
(Parameter.set 'Audio_Method 'Audio_Command)
(Parameter.set 'Audio_Command "paplay --volume=$((65536*65/100)) --raw --rate=17000 --channels=1 --channel-map=center $FILE --client-name=Festival --stream-name=Speech")