WMTrans Unknown Word Analyzer
The Unknown Word Analyzer analyzes and returns inflection and word formation information of any valid word for a specified language.
In addition to the features provided by the Inflection Analyzer, the Unknown Word Analyzer has the ability to recognize unknown (i.e. not lexicalized) words based on word formation rules. This is a very useful feature for languages that are very generative in their word formation character. See the general Unknown Word Products introduction for further explanation.
Analysis Example
Analogous to the lexicalized word Inflection Analyzer that we also offer as a product, the Unknown Word Analyzer analyzes any word form, delivering as a result a rich set of useful information on inflection and word formation. It tolerates input elements that do not use special characters (e.g. the German word mögen written as moegen), tracing this information with a special feature in the delivered output.
The Unknown Word Analyzer is a superset of the lexicalized word Inflection Analyzer.
If a word form is not part of any of the lexicalized entries (i.e. it cannot be found in our lexicalized words using the full Inflection Analyzer), a second API function gives you the opportunity to analyse its structure, segmentations and word formations, and to associate one or more word formation rules with one or more of the corresponding citation forms. Refer to the API description to see details on how to use and integrate it into your program.
Here is an example of a possible analysis interaction using the Unknown Word Analyzer.
The four different API function calls distinguish between the results of the Inflection Analyzer (one function) and the Unknown Word Analyzer (three functions, each one for a different choice of output). The formal output description can be found in the WMTrans developer zone.
Note: There are three ways of representing the output of an unknown word analysis. You can choose to see only inflection information, only word formation information or both. For each kind of output there is a corresponding API function.
Lexicalized Word Function Call
query -> sang
result -> sang
(Cat N)(Gender M)(Num SG)(Case Nom)(ID 0-1),
(Cat N)(Gender M)(Num SG)(Case Dat)(ID 0-1),
(Cat N)(Gender M)(Num SG)(Case Acc)(ID 0-1)
singen
(Cat V)(Aux haben)(Mod Ind)(Temp Impf)(Pers 1st)
(Num SG)(ID 0-1),
(Cat V)(Aux haben)(Mod Ind)(Temp Impf)(Pers 3rd)
(Num SG)(ID 0-1)
query -> sang Filter: (Cat V)
result -> singen
(Cat V)(Aux haben)(Mod Ind)(Temp Impf)(Pers 1st)
(Num SG)(ID 0-1),
(Cat V)(Aux haben)(Mod Ind)(Temp Impf)(Pers 3rd)
(Num SG)(ID 0-1)
Unknown Word Function Call
query -> aufsinken
result ->
<inflection>
aufsinken
(Cat V)(Aux haben)(Mod Inf)(Temp Pres)
(ID 0-1)
(Cat V)(Aux haben)(Mod Ind)(Temp Pres)
(Pers 1st)(Num PL)(ID 0-1)
(Cat V)(Aux haben)(Mod Ind)(Temp Pres)
(Pers 3rd)(Num PL)(ID 0-1)
(Cat V)(Aux haben)(Mod Conj-1)(Temp Pres)
(Pers 1st)(Num PL)(ID 0-1)
(Cat V)(Aux haben)(Mod Conj-1)(Temp Pres)
(Pers 3rd)(Num PL)(ID 0-1)
</inflection>
<wf>
auf + sinken
(Cat V),
(WFRule Derivation.To-V.V-To-V.
Prefixing.Detachable-Prefix.V_Irregular)
1: auf
(WFCat Derivation.To-V.V-To-V.
Prefixing.V-Prefix.Detachable)
2: sinken (Cat V)
</wf>
query -> abbausicheres
result ->
<inflection>
abbausicher
(Cat A)(Degree Pos)(AForm es)(ID 0)
</inflection>
<wf>
abbau + sicheres
(Cat A),
(WFRule Compounding.A-Comp.N+A.No-Umlaut.
N+A_No_Linking_Element)
1: abbau (Cat N)
2: sicher (Cat A)
</wf>