WMTrans Unknown Word Recognizer
The Unknown Word Recognizer is able to recognize any valid word, be it inflected or in citation form. The result is a boolean value.
In addition to the features provided by the Recognizer, the Unknown Word Recognizer has the ability to recognize unknown (i.e. not lexicalized) words based on word formation rules. This is a very useful feature for languages that are very generative in their word formation character. See the general Unknown Word Products introduction for further explanations.
Analysis Example
Analogous to the lexicalized word Recognizer that we also offer as a product, the Unknown Word Recognizer returns a boolean (yes/no) value, determining if the word can be considered valid or not. It tolerates input elements that do not use special characters (e.g. the German word mögen written as moegen).
The Unknown Word Recognizer is a superset of the lexicalized word recognizer. If a word form is not part of any of the lexicalized entries (i.e. it cannot be found in our lexicalized words using the Recognizer), a second API function gives you the opportunity to analyse its structure, segmentations and word formations, and decide if the word ia a potential valid word. Refer to the API description to see details on how to use and integrate it into your program.
Lexicalized Word Function Call
query -> sang
result -> true
query -> sang Filter: (Cat V)
result -> true
query -> sang Filter: (Cat Adj)
result -> false
Unknown Word Function Call
query -> aufgesunken
result -> true
query -> skandalgeschüttelten
result -> true
query -> abbausicheres
result -> true