README file for Inside-Outside algorithm University To remove any frames surrounding this page, click here

README for inside-outside

This is an implementation of the inside-outside algorithm for estimating the probabilities of PCFG productions. While there are several places where better indexing might improve performance, I have paid attention to ensuring that the algorithm runs in n^3 time and in time linear in grammar size. I have used this program with grammars with tens of thousands of productions, trained using tens of thousands of sentences (it may take a week).

This program may be freely used for research purposes; however, I do request acknowledgement in any publications containing results that were produced with this program or code derived from this program.

Usage:

io [-d debuglevel] [-R nruns] [-a alpha] [-g grammar] [-s stoptol] [-p prune] [-l maxlen] [-m minits] [-n maxits] [-S randseed] [-j jitter] [-J per-iteration-jitter] [-b annealstart] [-B annealstop] [-N nanneal] [-T tracefile] yieldfile

where:

The grammar should consist of rules, one per line, of the format

where weight is the initial rule probability used to start EM, and bias is a pseudo-count added to this rule at each iteration

Compilation:

The program is written in ANSI C, and should compile just by saying ``make'', to produce an executable ``io''.

Setting optimization flags makes a big difference, as the program spends a lot of its time in tight loops. I find I get a big speed up by turning on as much optimization as possible.

The top Makefile target (testrun) runs the program on small test examples.