WMTrans: Generating a spell checker for an existing vocabulary - the NZZ Case
The Customer
Neue Zürcher Zeitung (NZZ) publishes an online newspaper called NZZ Online. Since May 2001, NZZ Online has its own editorial team, in charge of presenting up-to date information on the Internet. For content management, the NZZ Online editorial team relies on Wyona CMS, which offers a WYSIWIG (What you see is what you get) XML-editor running in a web browser.
The Challenge
As NZZ has very high quality requirements regarding spelling of its articles, Wyona CMS needed to be extended with a custom made spell checker, offering the following characteristics:
- NZZ vocabulary, including NZZ-specific spelling variants
- High text processing speed
The WMTrans Solution
All WMTrans products have been produced by means of the WMTrans transducer framework: taking a lexical database and a specification of the transducer input and output as basis, the framework automatically generates the desired transducer. The flexibility of the framework allows fast and reliable transducer development. This framework was used to produce a WMTrans Recognizer, which is employed as the core of the spell checker for NZZ Online: for every word in a document the WMTrans Recognizer checks whether it is compliant with the NZZ vocabulary.
The Results
NZZ benefits from the use of WMTrans products in the following ways:
- Rapid development of a spell checker based on the NZZ dictionary
Based on a full form lexicon maintained by NZZ, a WMTrans Recognizer was generated. - Rapid processing of NZZ Online documents
The WMTrans Recognizer has been embedded in a filter program for XML documents. This program, triggered by the Wyona CMS editorial system, inserts special XML tags around unrecognized, i.e. potentially wrongly spelt words. The Wyona CMS editorial system subsequently marks these words in red.
Since the roll-out of the spell checker in August 2001 it has been used successfully and without changes by the NZZ Online editorial team.
>> Download full case study as PDF document