|
Readme for SMTGUI |
|
Philipp Koehn, Evan Herbst |
|
7 / 31 / 06 |
|
----------------------------------- |
|
|
|
SMTGUI is Philipp |
|
|
|
newsmtgui.cgi is the main program. Corpus.pm is my module; Error.pm is a standard part of Perl but appears to not always be distributed. The accompanying version is Error.pm v1.15. |
|
|
|
The program requires file |
|
|
|
For the corpus with name CORPUS, there should be present the files: |
|
- CORPUS.f, the foreign input |
|
- CORPUS.e, the truth (aka reference translation) |
|
- CORPUS.SYSTEM_TRANSLATION for each system to be analyzed |
|
- CORPUS.pt_FACTORNAME for each factor that requires a phrase table (these are currently used only to count unknown source words) |
|
|
|
The .f, .e and system-output files should have the usual pipe-delimited format, one sentence per line. Phrase tables should also have standard three-pipe format. |
|
|
|
A list of standard factor names is available in @Corpus::FACTORNAMES. Feel free to add, but woe betide you if you muck with |
|
|
|
Currently the program assumes you |
|
|
|
$ $BIN/tag-english < CORPUS.lc > CORPUS.pos-tmp (call Brill) |
|
$ $BIN/morph < CORPUS.pos-tmp > CORPUS.morph |
|
$ $DATA/test/factor-stem.en.perl < CORPUS.morph > CORPUS.lemma |
|
$ cat CORPUS.pos-tmp | perl -n -e |
|
$ $DATA/test/combine-features.perl CORPUS lc+pos lemma > CORPUS.lc+pos+lemma |
|
$ rm CORPUS.pos-tmp (cleanup) |
|
|
|
where $BIN=/export/ws06osmt/bin, $DATA=/export/ws06osmt/data. |
|
|
|
To get German POS tags and lemmas from a words-only corpus (the first step must be run on linux): |
|
|
|
$ $BIN/recase.perl --in CORPUS.lc --model $MODELS/en-de/recaser/pharaoh.ini > CORPUS.recased (call pharaoh with a lowercase->uppercase model) |
|
$ $BIN/run-lopar-tagger-lowercase.perl CORPUS.recased CORPUS.recased.lopar (call LOPAR) |
|
$ $DATA/test/factor-stem.de.perl < CORPUS.recased.lopar > CORPUS.stem |
|
$ $BIN/lowercase.latin1.perl < CORPUS.stem > CORPUS.lcstem (as you might guess, assumes latin-1 encoding) |
|
$ $DATA/test/factor-pos.de.perl < CORPUS.recased.lopar > CORPUS.pos |
|
$ $DATA/test/combine-features.perl CORPUS lc pos lcstem > CORPUS.lc+pos+lcstem |
|
|
|
where $MODELS=/export/ws06osmt/models. |
|
|