Go to the first, previous, next, last section, table of contents.
A number of basic data manipulation tools are supported by Festival. These often make building new modules very easy and are already used in many of the existing modules. They typically offer a Scheme method for entering data, and Scheme and C++ functions for evaluating it.
Regular expressions are a formal method for describing a certain class of mathematical languages. They may be viewed as patterns which match some set of strings. They are very common in many software tools such as scripting languages like the UNIX shell, PERL, awk, Emacs etc. Unfortunately the exact form of regualr expressions often differs slightly between different applications making their use often a little tricky.
Festival support regular expressions based mainly of the form used in
the GNU libg++ Regex
class, though we have our own implementation
of it. Our implementation (EST_Regex
) is actually based on Henry
Spencer's `regex.c' as distributed with BSD 4.4.
Regular expressions are represented as character strings which are interpreted as regular expressions by certain Scheme and C++ functions. Most characters in a regular expression are treated as literals and match only that character but a number of others have special meaning. Some characters may be escaped with preceeding backslashes to change them from operators to literals (or sometime literals to operators).
.
$
^
X*
X+
X?
[...]
a-z
for all
lower case characters. If the first character of the range is
^
then it matches anything character except those specificed
in the range. If you wish -
to be in the range you must
put that first.
\\(...\\)
*
, +
, ?
etc to operate on more than single characters.
X\\|Y
Note that actuall only one backslash is needed before a character to escape it but becuase these expressions are most often contained with Scheme or C++ strings, the escpae mechanaism for those strings requires that backslash itself be escaped, hence you will most often be required to type two backslashes.
Some example may help in enderstanding the use of regular expressions.
a.b
a
and
ending with a b
.
.*a
a
.*a.*
a
[A-Z].*
[0-9]+
-?[0-9]+\\(\\.[0-9]+\\)?
[^aeiouAEIOU]+
\\([Ss]at\\(urday\\)\\)?\\|\\([Ss]un\\(day\\)\\)
The Scheme function string-matches
takes a string and
a regular expression and returns t
if the regular
expression macthes the string and nil
otherwise.
One of the basic tools available with Festival is a system for building and using Classification and Regression Trees (breiman84). This standard statistical method can be used to predict both categorical and continuous data from a set of feature vectors.
The tree itself contains yes/no questions about features and ultimately provides either a probability distribution, when predicting categorical values (classification tree), or a mean and standard deviation when predicting continuous values (regression tree). Well defined techniques can be used to construct an optimal tree from a set of training data. The program, developed in conjunction with Festival, called `wagon', distributed with the speech tools, provides a basic but ever increasingly powerful method for constructing trees.
A tree need not be automatically constructed, CART trees have the advantage over some other automatic training methods, such as neural networks and linear regression, in that their output is more readable and often understandable by humans. Importantly this makes it possible to modify them. CART trees may also be fully hand constructed. This is used, for example, in generating some duration models for languages we do not yet have full databases to train from.
A CART tree has the following syntax
CART ::= QUESTION-NODE || ANSWER-NODE QUESTION-NODE ::= ( QUESTION YES-NODE NO-NODE ) YES-NODE ::= CART NO-NODE ::= CART QUESTION ::= ( FEATURE in LIST ) QUESTION ::= ( FEATURE is STRVALUE ) QUESTION ::= ( FEATURE = NUMVALUE ) QUESTION ::= ( FEATURE > NUMVALUE ) QUESTION ::= ( FEATURE < NUMVALUE ) QUESTION ::= ( FEATURE matches REGEX ) ANSWER-NODE ::= CLASS-ANSWER || REGRESS-ANSWER CLASS-ANSWER ::= ( (VALUE0 PROB) (VALUE1 PROB) ... MOST-PROB-VALUE ) REGRESS-ANSWER ::= ( ( STANDARD-DEVIATION MEAN ) )
Note that answer nodes are distinguished by their car not being atomic.
The interpretation of a tree is with respect to a Stream_Item The FEATURE in a tree is a standard feature (see section 14.6 Features).
The following example tree is used in one of the Spanish voices to predict variations from average durations.
(set! spanish_dur_tree ' (set! spanish_dur_tree ' ((R:SylStructure.parent.R:Syllable.p.syl_break > 1 ) ;; clause initial ((R:SylStructure.parent.stress is 1) ((1.5)) ((1.2))) ((R:SylStructure.parent.syl_break > 1) ;; clause final ((R:SylStructure.parent.stress is 1) ((2.0)) ((1.5))) ((R:SylStructure.parent.stress is 1) ((1.2)) ((1.0))))))
It is applied to the segment stream to give a factor to multiply the average by.
wagon
is constantly improving and with version 1.2 of the speech
tools may now be considered fairly stable for its basic operations.
Experimental features are described in help it gives. See the
Speech Tools manual for a more comprehensive discussion of using
`wagon'.
However the above format of trees is similar to those produced by many other systems and hence it is reasonable to translate their formats into one which Festival can use.
Bigram, trigrams, and general ngrams are used in the part of speech tagger and the phrase break predicter. An Ngram C++ Class is defined in the speech tools library and some simple facilities are added within Festival itself.
Ngrams may be built from files of tokens using the program
ngram
which is part of the speech tools. Its options
consist of commands and/or file names. The commands are:
--order NUM
build FILE
--order
option.
FILE will be tokenized in the standard way, this may often
be not what is needed and a specific program need be written.
load FILE
save FILE
test FILE
print_freqs
print_probs
For example consider we have two text files `trainfile' and `testfile', we could build a trigram from `trainfile' save it in `tf.ngram' and test it against `testfile' with the following command
ngram --order 3 build trainfile save tf.ngram test testfile
Within Festival ngrams may be named and loaded from files
and used when required. The LISP function load_ngram
takes a name and a filename as argument and loads the Ngram
from that file. For an example of its use once loaded see
`src/modules/base/pos.cc' or
`src/modules/base/phrasify.cc'.
Another common tool is a Viterbi decoder. This C++ Class is defined in the speech tools library `speech_tooks/include/EST_viterbi.h' and `speech_tools/stats/EST_viterbi.cc'. A Viterbi decoder requires two functions at declaration time. The first constructs candidates at each stage, while the second combines paths. A number of options are available (which may change).
The prototypical example of use is in the part of speech tagger which using standard Ngram models to predict probabilities of tags. See `src/modules/base/pos.cc' for an example.
The linear regression model takes models built from some external
package and finds coefficients based on the features and weights. A
model consists of a list of features. The first should be the atom
Intercept
plus a value. The following in the list should consist
of a feature (see section 14.6 Features) followed by a weight. An optional third
element may be a list of atomic values. If the result of the feature is
a member of this list the feature's value is treated as 1 else it is 0.
This third argument allows an efficient way to map categorical values
into numeric values. For example, from the F0 prediction model in
`lib/f2bf0lr.scm'. The first few parameters are
(set! f2b_f0_lr_start '( ( Intercept 160.584956 ) ( Word.Token.EMPH 36.0 ) ( pp.tobi_accent 10.081770 (H*) ) ( pp.tobi_accent 3.358613 (!H*) ) ( pp.tobi_accent 4.144342 (*? X*? H*!H* * L+H* L+!H*) ) ( pp.tobi_accent -1.111794 (L*) ) ... )
Note the feature pp.tobi_accent
returns an atom, and is hence
tested with the map groups specified as third arguments.
Models may be built from feature data (in the same format as `wagon' using the `ols' program distributed with the speech tools library.
Go to the first, previous, next, last section, table of contents.