Speech recognition grammar specification version 1.0
March 2004
Principles and Recommendations
Summary
This document defines the syntax for grammar representation. The grammars are intended for use by speech recognizers and other grammar processors so that developers can specify the words and patterns of words to be listened for by a speech recognizer.
The syntax of the grammar format is presented in two forms, an Augmented BNF (ABNF) Form and an XML Form. The specification ensures that the two representations are semantically mappable to allow automatic transformations between the two forms.
- Augmented BNF syntax (ABNF): this is a plain-text (non-XML) representation which is similar to traditional BNF grammar and to many existing BNF-like representations commonly used in the field of speech recognition including the JSpeech Grammar Format [JSGF] from which this specification is derived. Augmented BNF should not be confused with Extended BNF which is used in DTDs for XML and SGML.
- XML: This syntax uses XML elements to represent the grammar constructs and adapts designs from the PipeBeach grammar, TalkML [TALKML] and a research XML variant of the JSpeech Grammar Format [JSGF].
Both the ABNF Form and XML Form have the expressive power of a Context-Free Grammar (CFG). A grammar processor that does not support recursive grammars has the expressive power of a Finite State Machine (FSM) or regular expression language. For definitions of CFG, FSM, regular expressions and other formal computational language theory see, for example, [HU79]. This form of language expression is sufficient for the vast majority of speech recognition applications.
This W3C standard is known as the Speech Recognition Grammar Specification and is modelled on the JSpeech Grammar Format specification [JSGF], which is owned by Sun Microsystems, Inc., California, U.S.A.