English
Available Datasets
The following datasets are deliverable for all types of Analyzers, Generators and Analyzer/Generators currently available in English:
- Evaluation license: all nouns, verbs, adjectives and adverbs starting with the letters 'a' to 'd', plus all other types of entries (a total of about 11'000 entries).
- Full license: all currently available English words (currently more than 50'000), with contraction elements analysis.
The following datasets are available for the English Word Formation Analyzer/Generator:
- Evaluation license: all derivation level entries concerning all nouns, verbs, adjectives and adverbs starting with the letters 'a' to 'd', plus all other types of entries (a total of about 7'000 relations).
- Full license: all derivation level entries for all entries (currently 43'000 relations).
Language Specific Features
Here are some English-specific features that need to be considered by your client application, in order to make the best use of our data analyzers.
| Attribute | Meaning |
| Variety | English variety (regional varieties of lexical items): British Common English (BCE), British English, American English |
| SpellVar | Spelling variants: British Common English (standard), exclusive American English spelling variant (AE), optional American English spelling variant (ae), optional British spelling variant (be) |
| Contraction | Contractions of elements, usually clitics |
British and American English
Our English analyzers are able to distinguish between different spelling variants. We adopted British Common English (BCE) as standard spelling type. Special features mark American and British spelling variants.
The features are:- (SpellVar BCE): British Common English spelling
- (SpellVar AE): exclusive American spelling variant, used instead of BCE spelling. Example: BCE colour, AE color
- (SpellVar ae): optional American spelling variant, used as well as BCE spelling. Example: BCE travelled, AE traveled
- (SpellVar be): optional British spelling variant, used as well as BCE spelling. Example: BCE realise, be realize
With this information you can set a filter to analyze your text according to your specific criteria.
Note:
SpellVar-Features differ from Variety-Features. Variety-Features are used to mark regional varieties of lexical items, such as the American word "billfold" for BCE "wallet", "mailman" vs "postman".Contractions
The English version is able to analyze and recognize word forms with apostrophes:
- Possessive forms of nouns; this includes singular word forms like "entry's", as well as plural word forms like "points'", including exceptions.
- Contractions of auxiliary + not such as "doesn't", "haven't".
Please note: If you require a single analysis of a word form with an apostrophe, do not use the apostrophe character as a separator within your application.
Here is an example for the Lemmatizer:
query -> cat's
filter -> (Cat N)
result -> cat
(Cat N)(Contraction N+'s/Clitic)
(Cat N)(Contraction N+have/V)
(Cat N)(Contraction N+be/V)
#
The Contraction Feature
The contraction feature is used to specify contraction elements included in the answer. The above example shows the Lemmatizer results for the query "cat's". The single entities within the contraction feature are separated by the character '+'. An entity is described uniquely by its category, if it is an "open" entity, i.e. all entries of the same category (following specific restrictions) could potentially be applied to an entity. On the other hand, an entity is specified by the pair citation form "/" category, if it describes an element from a finite set of possibilities.
Here is a formal syntax description:
contraction-feature ::= "(Contraction" value ")".
value ::= entity {"+" entity}+.
entity ::= citation-form ["/" category].
Problem Feature
Features of the type (Problem xy) are related to the entry specification in our database. They can be ignored.