WMTrans Language Processing Tools Available
German Word Analysis and Generation for more than Two Million Words
Basel, Switzerland, November 25, 2002. Canoo Engineering AG today announced the release of its Word Manager Transducer (WMTrans) product range. Available through the Web at http://www.canoo.com/wmtrans, the German morphology analysis software, developed by Canoo, offers intelligent text processing for information retrieval and language processing applications. Typical use cases include intelligent search, text indexing, text mining, language learning, hyperlink generation, spell checking, grammar checking, and machine translation.
WMTrans is based on Canoo's German Morphological Dictionary, containing more than 200'000 entries and generating over two million fully categorized word forms, including information on word formation, all types of inflectional irregularities and spelling variants.
The WMTrans product range includes the following software components:
WMTrans Lemmatizer
The Lemmatizer analyses German words and finds their base form and category. An analysis of ging, for example, returns the infinitive verb, gehen, the corresponding base form listed in a dictionary.
query -> ging result -> gehen (Cat V)
WMTrans Unknown Word Lemmatizer
In German, complex new words can be formed easily - either by compounding or by adding pre- and suffixes. Examples of German compounds are words like Umsatzwarnungen, skandalgeschüttelten, or abbausicheres. Though widespread, many compounds have a low frequency and are not listed in dictionaries. The Unknown Word Lemmatizer recognizes non-lexicalized words such as compounds by applying word formation rules. This is a powerful advantage in a generative language such as German.
The Unknown Word Lemmatizer includes the Lemmatizer and therefore knows both the entire dictionary and the word formation rules. Typical usage is as follows: A first call to the Lemmatizer determines whether or not a word form is included in the Morphological Dictionary. If the Lemmatizer does not find the word, it is passed on to the Unknown Word Lemmatizer for further processing. The Unknown Word Lemmatizer analyses the word's structure and associates one or more word formation rules with the corresponding base forms in the lexicon. The output is the base form of a word and its category. As a result, a word such as Umsatzwarnungen is analyzed successfully, even though the base form Umsatzwarnung is not listed in the dictionary.
query -> umsatzwarnungen result -> umsatzwarnung (Cat N)
WMTrans Inflection Analyzer
The Inflection Analyzer determines the base form and category of a word, as well as providing additional grammatical and orthographical information.
WMTrans Recognizer
The Recognizer detects if a character string is a valid German word.
WMTrans Generator
The Generator returns all inflected word forms and spelling variants for a base form.
WMTrans Inflection Analyzer/Generator
The Inflection Analyzer/Generator determines the base form and category of a word and computes all possible inflected forms and spelling variants for a given base form.
WMTrans Word Formation Analyzer/Generator
The Word Formation Analyzer/Generator determines the components of a word from which it has been derived or composed and finds all possible word composites and derivations in which a given word is involved.
Benefits of WMTrans Products
Canoo's language tools offer the following unique benefits:
Effective use of technology: WMTrans products are finite state machines, which are highly efficient in memory consumption and processing speed.
Excellent dictionary quality: the dictionary has been hand-compiled by a team of highly qualified linguists, using a dedicated authoring environment, which offers superior support during data entry and ensures a high data consistency.
Complete set of word formation rules: this comprehensive dictionary knowledge is used, for example, by the Unknown Word Lemmatizer to provide accurate analyses of non-lexicalized entries.
Platforms
WMTrans products are available for several platforms:
| Platform (API) | Product |
| Windows, Linux, Solaris (Java) | WMTrans Lemmatizer WMTrans Unknown Word Lemmatizer |
| Linux (Java, C++) | WMTrans Lemmatizer WMTrans Inflection Analyzer WMTrans Recognizer WMTrans Generator WMTrans Inflection Analyzer/Generator WMTrans Word Formation Analyzer/Generator |
Download Trial Versions
Download free evaluation licenses at:
http://www.canoo.com/wmtrans/downloads
Browse through the product descriptions, test the APIs and find out how the WMTrans shared libraries can be integrated into your application.
Canoo Online Services are based on WMTrans products and provide examples of possible applications. These services are available at: