To remove any frames surrounding this page,
click here
CG136/CS146:
Introduction to Computational Linguistics
Mark Johnson
Spring semester 2006
Tuesday and Thursday, 2:30-3:50pm, Metcalf Chemistry 204
Resources for the class
Class Handouts and Slides
- Python programs in Zip archive (also available
on CS machines under
~mj/cg136/programs)
- Week 13 slides (Inside-Outside homework, due 2nd May)
- Week 12 slides, notes on treebanks
- Week 12 slides (PCFG homework, due 25th April)
- Catalan grammar preadsheet showing non-convergence
of sum of tree probabilities for different rule probabilities in grammar
S --> S S, S --> x.
- Week 10 slides (PCFGs)
- Spreadsheet containing HMM data
- Week 8 slides (Gibbs sampler for IBM model 1)
- Week 7 slides (HMMs, Viterbi and Forward-Backward algorithms)
- Week 6 slides (Chi^2 and Likelihood ratio tests)
- Week 5 slides (information theory, entropy and
cross-entropy)
- Week 4 slides (including multinomials, language models, Bayesian estimates, etc.)
- Week 3 slides (including Lagrange multipliers, IBM model 0, etc)
- Week 2 Tuesday slides
- Mitzenmacher (2003)
A Brief History of Generative Models for Power Law and Lognormal
Distributions, Internet Mathematics Vol. 1, No. 2: 226-251.
- Week 1 slides
- Syllabus
Word alignment resources
- Moore, Robert C. 2004. "Improving IBM Word Alignment Model 1".
- Franz Josef Och, Hermann Ney. "Improved Statistical Alignment Models". Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 440-447, Hongkong, China, October 2000.
In Proceedings, 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526.
- Resources from the 2003 HLT-NAACL word alignment workshop, including English-French training data, trial (development?) data and test data (this is available on the CS machines
under
/pro/mt/cs146)
- Workshop on word
alignment for languages with scarce resources presented at
ACL 2005 workshop
on building and using parallel texts
- General resources for
sentence and word alignments
- The Canadian Hansards corpus is described
here, and is available on the CS machines in the directory
/pro/mt/data/hansard.36/, and on the Cog Sci machines
in the directory /usr/local/data/hansards/Release-2001.1a/.
Python is a programming language
that is especially simple to learn. The interpreter
and lots of documentation are
available for free download from
the main Python download page.
Besides these standard documents, the book
Dive into Python
and the
Python Quick Reference may be helpful.
The Natural Language Toolkit includes
Python code for many of the models that we will discuss in class.
R is a free software environment
for statistical computing that runs on Windows, Macs and Linux platforms.
You can download it from
a CRAN archive site near you.
Syllabus, textbook and readings
Click here for class syllabus in pdf.
The textbook for the class is:
Here are some other readings that
we may look at in this class:
On-line class material:
I plan to make example programs and other class materials available on-line in the
directory ~mj/cg136 on both triton.cog.brown.edu and cs.brown.edu.
Let me know if some other means of distribution would be more convenient for you.
Mark Johnson
22nd January, 2006
To remove any frames surrounding this page,
click here