Architecture

Framework

All WMTrans products are based on a common Java or C++ object-oriented framework. The C++ implementations are delivered as shared libraries (analyzers/generators) and stand-alone applications (document compilers). The Java implementations are available as classes in a jar file. The Java framework is also available for custom-made solutions implemented by Canoo. The framework approach ensures that each product is based on the same proven architecture, which guarantees high quality. On demand, Canoo can develop custom-made transducers for specific requirements at competitive prices.

Available APIs

The pure Java implementation can be integrated into your Java program through a class API.
The C++ products (shared library) have two separate APIs. This means that the library can be used and integrated in C/C++ programs as well, as in Java programs. For a more detailed explanation, see the section on APIs.

Framework Instances

Each instance of the framework represents a particular finite-state machine, i.e. a combination of states and arcs, as shown in the following diagram:
Figure 1: Finite State Automaton

The machine has an alphabet consisting of two symbols (1 and 0), which accepts all kinds of words where the symbol 1 appears an odd number of times.

WMTrans products mostly use finite-state transducers as instances of the framework. A transducer is a finite-state machine able to accept input sequences and to translate these into output sequences. Such machines have state transitions that generate an output symbol while accepting an input symbol, as shown in the following diagram:
Figure 2: Finite State Transducer
In this case, the machine is able to flag occurrences of the pattern AAB by printing a 1 on the output tape only when the substring AAB appears in the input stream. The character "/" separates the input element from the output element.

Please note that in this particular example, end states are not required, but they are necessary for word form analysis and generation.

Transducers are used for different types of analyzers and generators, e.g.:

  • to accept word forms as input (e.g. schläfst) and, depending on the transducer type, deliver the corresponding citation form(s) (e.g. schlafen);
  • to generate the paradigm starting from the citation form (starting from the citation form schlafen, the paradigm consists of forms like schlafe, schläfst, schläft, etc.);
  • to generate/analyze derivations and word formations (e.g. einschlafen, Schlafgelegenheit).

Demos based on this functionality can be tested at www.canoo.net.