Concepts
For every product we distinguish between:
- analyzer/generator runtime program - delivered as a shared library with data files
- data compiler - stand-alone application
Word Analysis and Generation
The shared libraries offer word analysis and/or word generation functions. Word analysis refers to legal string identification, morphosyntactic word form analysis, and citation form delivery, or derivation and word formation analysis, depending on the product selected.Word generation includes the generation of an entire inflection paradigm, as well as word formations, starting from a word's base form, i.e. the citation form.
Technology
All WMTrans products are based on finite-state technology: simple finite-state automata or transducers, which are widely accepted as the most effective implementation for word form analysis and word form generation, both with respect to performance and memory usage.
Finite-State Automata
A deterministic finite-state automaton is a mathematical model of a machine that accepts a particular set of words over a given alphabet. It is represented as a combination of states and arcs, as shown in the following diagram. The machine has an alphabet of two symbols (1 and 0), which accepts all kinds of words where the symbols 1 appears an odd number of times.
An informal abstraction of the concept is shown in the next diagram: the automaton is represented by a black box which executes finite state control; sequentially, the input is read from a tape, at the end of which the finite state control decides whether the input is accepted (yes) or not (no). Such a mechanism is called finite-state acceptor.
Although finite state automata are usually thought of as processing strings
of letters of an alphabet, the input can conceptually be elements from any finite set.
Finite-State Transducers
Finite-state automata are useful when the task is to reliably recognize a sequence, and to deliver yes or no as result. They produce no other output. Finite-state machines which accept input sequences and translate them into output sequences are called sequential machines or transducers. Such machines incorporate transitions from one defined state to another, generating an output symbol while accepting an input symbol. The black box representation used for the acceptor can also be proposed for transducers, adding an output tape.
Applications for such machines can be control programs for elevators, traffic lights and other devices that monitor and react to limited stimuli. One important application is in the field of computational linguistics: word form analysis and word form generation have proven to be best realized with finite-state transducers for performance and memory reasons.
WMTrans products are based on this technology.