Word Formation Analyzer/Generator
The WMTrans Word Formation Analyzer/Generator analyzes and generates the first level of word formation history for any legal lexeme.
The result of an analysis query is a list of source lexemes, from which the given lexeme derives. The result of a generation query is a list of derivated lexemes, created by derivation and word formation. All features can be used as filters during analysis and generation. See Using Filters, for a detailed explanation.
Implementation
We currently offer two versions of the software:
- A pure Java implementation, which runs on any platform; requires at least JRE 1.3 to be installed
- A platform-specific shared library implementation (currently available for Linux), delivered with two different APIs (ANSI C/C++ and Java)
Both versions can be easily integrated into your own product. Please refer to the developer zone, for information how to install the selected version and how to use the delivered APIs.
Dataset
Depending on the license agreement, the dataset delivered includes either a limited number of entries, or the full set of entries defined so far. See the language specific page for further details.
Available languages
The following languages are available:
- English
- German
- Italian
Please see some of the language specific features that need to be considered by the client application.
Analysis Example
The Word Formation Analyzer module analyzes any lexeme, delivering as results a list of all its source lexemes, i.e. the lexemes from which it derives, followed by a list of features, used for the unicity of each single lexeme.
Here are some examples of possible word form analysis interactions using the Analyzer. The formal output syntax is described in the developer zone.
German Examples
query -> kennenlernen
result -> kennen
(Cat V)(Aux haben)
lernen
(Cat V)(Aux haben)
#
query -> gemocht
result -> mögen
(Cat V)(Aux haben)
#
English Examples
query -> countdown
result -> count
(Cat V)(Variety BCE)
down
(Cat Adv)(Variety BCE)
#
query -> disappear
result -> appear
(Cat V)(Variety BCE)
#
Italian Examples
query -> appartenenza
result -> appartenere
(Cat V)(Aux avere)(Aux essere)
#
Generation Example
The Generator module delivers a list of all directly derived lexemes, related to the input lexeme. The entries are followed by a list features, used for the unicity of each single lexeme.
Here are examples of possible word form generations.
The formal output syntax is described in the developer zone.
German Examples
query -> mahnen
result -> abmahnen
(Cat V)(Aux haben)
anmahnen
(Cat V)(Aux haben)
einmahnen
(Cat V)(Aux haben)
ermahnen
(Cat V)(Aux haben)
gemahnen
(Cat V)(Aux haben)
gemahnt
(Cat A)(Lexeme mahnen)
mahnbescheid
(Cat N)(Gender M)
mahnbrief
(Cat N)(Gender M)
mahnend
(Cat A)
mahner
(Cat N)(Gender M)
mahngebühr
(Cat N)(Gender F)
mahnmal
(Cat N)(Gender N)(Plural e),
(Cat N)(Gender N)(Plural er)
mahnruf
(Cat N)(Gender M)
mahnschreiben
(Cat N)(Gender N)
mahnstätte
(Cat N)(Gender F)
mahnung
(Cat N)(Gender F)
mahnverfahren
(Cat N)(Gender N)
mahnwache
(Cat N)(Gender F)
mahnwort
(Cat N)(Gender N)
mahnzeichen
(Cat N)(Gender N)
mahnzettel
(Cat N)(Gender M)
vermahnen
(Cat V)(Aux haben)
#
English Examples
query -> appear
result -> apparent
(Cat A)(Variety BCE)
appearance
(Cat N)(Variety BCE)
disappear
(Cat V)(Variety BCE)
#
Italian Examples
query -> bosco
result -> abbracciabosco
(Cat N)(Gen M)
boscaglia
(Cat N)(Gen F)
boscaiolo
(Cat N)(Gen M)
boschetto
(Cat N)(Gen M)
boschivo
(Cat Adj)(Manner Qual)
boscoso
(Cat Adj)(Manner Qual)
#
Using Filters
Using filters restricts the citation form used as input for the query. A single form may occur as citation form in different lexemes. A filter gives you the opportunity to disambiguate the request.
Specifying "(Cat N)" as a filter for a request does not mean that you will only receive noun lexemes as results. It means that you only want to obtain the results of the analysis/generation based on the noun lexeme.
Consider the following example:
The German citation form "gehen" occurs as citation form in two different lexemes. The first one is a verb and the second one is a noun. If you define a generation request with the form "gehen", without specifying any filter, you will get all lexemes derived from both, the noun and the verb. If, however you want to restrict the request to the noun lexeme "gehen", you must specify the filter "(Cat N)", providing the following result:
query -> gehen
filter -> (Cat N)
result -> bahngehen
(Cat N)(Gender N)
dickgehen
(Cat N)(Gender N)
fünfzigkilometergehen
(Cat N)(Gender N)
felsgehen
(Cat N)(Gender N)
handstandgehen
(Cat N)(Gender N)
schlafengehen
(Cat N)(Gender N)
zubettgehen
(Cat N)(Gender N)
zugrundegehen
(Cat N)(Gender N)
#
More interesting is the restriction to the verb lexeme "gehen", not because it generates more results, but because among them there are also elements which are not verbs, but which are derivatives from the verb "gehen":
query -> gehen
filter -> (Cat V)
result -> übergehen
(Cat V)(Aux haben)(Pref No-Detach),
(Cat V)(Aux sein)(Pref Detach)
abgehen
(Cat V)(Aux sein)
abwärtsgehen
(Ortho Old-Obs)(OSepRule Adv+V)(Cat V)(Aux sein)
angehen
(Cat V)(Aux haben)(Aux sein)
aufgehen
(Cat V)(Aux sein)
...
...
gänger
(Cat N)(Gender M)
gängig
(Cat A)
gang
(Cat N)(Gender M)
gangbar
(Cat A)
...
...
#