Sisulizer Version 3 ist ein kostenpflichtiges Update für alle Sisulizer 1.x und 2008/2010-Kunden.
Verwenden Sie noch Sisulizer 1.x oder Sisulizer 2008/2010?
Aktualisieren Sie jetzt auf Version 3 und nutzen Sie alle Neuigkeiten in Version 3.
Die Angebote richten sich an kommerzielle und industrielle Kunden.
Alle Preisangaben sind netto.
Komplette Preisliste.
Suchen Sie die richtige Edition? Besuchen Sie unsere Vergleichstabelle
23.4.2012
Der neue Build kommt mit vielen neuen Features. [mehr]
9.11.2011
Sisulizer Version 3 ist da. [mehr]
30.9.2011
Sie suchen nach Tipps und Trick zum Thema Sisulizer? [mehr]
8.9.2011
Die Delphi Tage 2011 in Köln sind ausverkauft! [mehr]
12.8.2011
Bitte verwenden Sie einen Download Manager. [mehr]
Segmentation is a feature that breaks paragraphs to sentences to help the translation work. This means that a single paragraph that is a continuous text will be splitted into several sentences. Segmentation rules decide how the breaking is done. Sisulizer uses Segmentation Rules Exchange (SRX) standard to specify the segmentation rules. Choose Tools | General menu and select Segmentation sheet to view and edit segmentation rules.
For a complete set of documentation pages for SRX go to http://www.lisa.org/standards/srx/srx.html.
SRX uses regular expressions to describe the rules. Regular expressions are very powerful to describe string patterns.
For regular expression syntax documentation pages goto http://icu.sourceforge.net/userguide/regexp.html
Let's have few rule examples:
| Rule type | Before break | After Break | Language | Description |
|---|---|---|---|---|
| Break | [\.\?!]+ |
\s+ |
All | A break occurs when there is one or more period, question mark, or exclamation mark following one or more white space (space, tab or new line). For example: Skiing is fun. Swimming is fun tool. Underline shows the break pattern. |
| Break | [。\.\?!]+ |
\s+ |
Japanese | A break occurs when there is one or more Asian full stop (Unicode 0x3002), period, question mark, or exclamation mark following one or more white space. For example: 私は東京に住んでいます。東京は大きいです。 Underline shows the break pattern. |
| Exception | [a-zà-ö0-9]\. |
\s+[a-zà-ö] |
All | Disables a break when a lower case character or a number is followed by a period, one ore more white space, and one lower case character. For example: Raaka-aineena voidaan käyttää esim. vanhoja autonrenkaita. Underline shows the pattern that is not a break even some break rule would indicate so. |
| Exception | (^|[\s\(\[])Mr. |
\s+ |
English | Disables a break whenever there is Mr. abbreviation either in the beginning of the sentence or following white space, parenthesis, or bracket. For example: The British Prime Minister is Mr. Blair. Underline shows the pattern that is not a break even some break rule would indicate so. |
Sisulizer contains both generic segmentation rules (e.g. language indecent rules) and language specific rules. You can add your own rules or remove build in rules.
Segmentation is available with following source types: HTML and XML files, and database data. By default it is turned off. You have to turn it on by using source's property dialog.