Go to the first, previous, next, last section, table of contents.
There have been a number of enhancements made to the basic system
(and the underlying speech tools) since the last release, here
is a list of the major points.
- Architecture restructuring
In order to allow more complex relations (particularly trees) and
in order to improve efficiency the underlying Utterance architecture
has been completely replaced. What were stream items and streams
have been replaced with items and relations. In most cases this
has simplified things, though most utterance access functions' names
have changed.
- New diphone synthesizer
A new diphone synthesizer and related signal processing functions
has been included. It includes all the analysis software for building
diphone databases too. The new system sounds better and is more
cleanly written allowing for new signal processing algorithms to be
slotted in, and for non-diphone unit sleection to use the same
signal processing code.
- Trainable letter to sound rules
A system for training letter to sound rules from example lexical
entries has been included. The old letter to sound rule
system still remains as it is easier for humans to write but
the training rule sets are substantially better than any of
the hand written rule sets for English (it has also been used
successfully for German and French).
- New signal processing library
A substatially new implementation fo the basic singal processing library
is included providing many of the basic functions as well as a framework
within which we can easily add more techniques.
- Tilt Intonation modelling
Full support is now included for the Tilt intomation models,
both training and use.
- Stochastic Context Free Grammar parsing
A probabilistic syntactic parser has been added with a grammar trained
from the Penn Treebank Wall Street Journal corpus. It produces adequate
bracketing for arbitrary text. The festival script
`scfg_parse_text' offers a simple interface to it. This parser
is not yet used in the default tts mode.
- Improvements in training tools
The speech tools have been substantially improved to include most of the
functions required to training all the models in the system. Wagon, the
CART builder, has substantial improvements making it more useful and
more likely to extract reasonable models. Also code to build utterance
structures for databases and then extract features for training models
is included with the system.
- Weighted finite state transducers
A basic library of weighted finite state tranducer functions has been
added. It includes basic compilers from regular expressions, regular
grammars, (pseudo-)context free grammars and Kay/Kaplan style context
dependent re-write rules (as used in two-level morphology). It also
includes standard WFST manipulation functions such as minimizing,
determinizing, compositions, intersection etc. A morphological analyser
for English is included using this technology (though isn't used in
the default tts modes).
- Example unit selection algorithm
An example unit selection algorithm using the cluster method described
in black97c is included. Although it is by no means stable it
shows the processes involved in building a unit selection synthesizer.
It is included to allow others to continue this avenue of
research and as an example of how to fit in new synthesis algorithms
into the overall system.
- Time and space efficiency
Some time has been spent to partition the speech tools library so that
it has only the necessary dependencies between files, this substantially
reduces the size of the the binaries. The text processing front end of
Festival has more than doubled in speed due to the new architecture and
some resstructuring of code. The lexicon system now includes a cache
system that reduces look up time substantially. However although the
run time system is smaller and faster than before, it does include more
functionality and hence the distribution is larger.
Go to the first, previous, next, last section, table of contents.