WMTrans Language Processing Tools Available

German Word Analysis and Generation for more than Two Million Words


Basel, Switzerland, November 25, 2002. Canoo Engineering AG today announced the release of its Word Manager Transducer (WMTrans) product range. Available through the Web at http://www.canoo.com/wmtrans, the German morphology analysis software, developed by Canoo, offers intelligent text processing for information retrieval and language processing applications. Typical use cases include intelligent search, text indexing, text mining, language learning, hyperlink generation, spell checking, grammar checking, and machine translation.

WMTrans is based on Canoo's German Morphological Dictionary, containing more than 200'000 entries and generating over two million fully categorized word forms, including information on word formation, all types of inflectional irregularities and spelling variants.

The WMTrans product range includes the following software components:


WMTrans Lemmatizer

The Lemmatizer analyses German words and finds their base form and category. An analysis of ging, for example, returns the infinitive verb, gehen, the corresponding base form listed in a dictionary.

query   -> ging
result  -> gehen (Cat V)

WMTrans Unknown Word Lemmatizer

In German, complex new words can be formed easily - either by compounding or by adding pre- and suffixes. Examples of German compounds are words like Umsatzwarnungen, skandalgeschüttelten, or abbausicheres. Though widespread, many compounds have a low frequency and are not listed in dictionaries. The Unknown Word Lemmatizer recognizes non-lexicalized words such as compounds by applying word formation rules. This is a powerful advantage in a generative language such as German.

The Unknown Word Lemmatizer includes the Lemmatizer and therefore knows both the entire dictionary and the word formation rules. Typical usage is as follows: A first call to the Lemmatizer determines whether or not a word form is included in the Morphological Dictionary. If the Lemmatizer does not find the word, it is passed on to the Unknown Word Lemmatizer for further processing. The Unknown Word Lemmatizer analyses the word's structure and associates one or more word formation rules with the corresponding base forms in the lexicon. The output is the base form of a word and its category. As a result, a word such as Umsatzwarnungen is analyzed successfully, even though the base form Umsatzwarnung is not listed in the dictionary.

query   -> umsatzwarnungen
result  -> umsatzwarnung (Cat N)

WMTrans Inflection Analyzer

The Inflection Analyzer determines the base form and category of a word, as well as providing additional grammatical and orthographical information.

WMTrans Recognizer

The Recognizer detects if a character string is a valid German word.

WMTrans Generator

The Generator returns all inflected word forms and spelling variants for a base form.

WMTrans Inflection Analyzer/Generator

The Inflection Analyzer/Generator determines the base form and category of a word and computes all possible inflected forms and spelling variants for a given base form.

WMTrans Word Formation Analyzer/Generator

The Word Formation Analyzer/Generator determines the components of a word from which it has been derived or composed and finds all possible word composites and derivations in which a given word is involved.


Benefits of WMTrans Products

Canoo's language tools offer the following unique benefits:

  • Effective use of technology: WMTrans products are finite state machines, which are highly efficient in memory consumption and processing speed.

  • Excellent dictionary quality: the dictionary has been hand-compiled by a team of highly qualified linguists, using a dedicated authoring environment, which offers superior support during data entry and ensures a high data consistency.

  • Complete set of word formation rules: this comprehensive dictionary knowledge is used, for example, by the Unknown Word Lemmatizer to provide accurate analyses of non-lexicalized entries.

Platforms

WMTrans products are available for several platforms:


Platform (API) Product
Windows, Linux, Solaris (Java) WMTrans Lemmatizer
WMTrans Unknown Word Lemmatizer
Linux (Java, C++) WMTrans Lemmatizer
WMTrans Inflection Analyzer
WMTrans Recognizer
WMTrans Generator
WMTrans Inflection Analyzer/Generator
WMTrans Word Formation Analyzer/Generator

Download Trial Versions

Download free evaluation licenses at:

http://www.canoo.com/wmtrans/downloads

Browse through the product descriptions, test the APIs and find out how the WMTrans shared libraries can be integrated into your application.

Canoo Online Services are based on WMTrans products and provide examples of possible applications. These services are available at:

http://www.canoo.net