Update July 19, 2009 : You can now use nltk python egg instead, read the NLTK Installation with Python setuptools post.
While attending a short program in computational linguistics at Dravidian University, Dr. Arul introduced me to NLTK (Natural Language Toolkit). It was full two years before that I finally decided to have a close look at it. Like most linguists at the lab I used Perl programming language. With new version of NLTK 2.0 released last month, NLTK now works with python 2.6. Here a quick start guide for NLTK on Ubuntu Linux.
Installing NLTK on Ubuntu with Python 2.6
At the time of writing this post the Debian package on NLTK download page is built for Python 2.5. Ubuntu ships with Python 2.6 by default. So you need to download the source package from the NLTK download page.
NLTK needs some dependency modules, lets install them.
sudo apt-get install python-numpy python-matplotlib prover9
Uncompress the source package and run the NLTK setup.
$ unzip nltk-2.0b3.zip
$ cd nltk-2.0b3/
$ ls build LICENSE.txt nltk PKG-INFO README.txt setup.py yaml
$ sudo python setup.py install
After finishing the NLTK setup, you should download the NLTK data which contains various corpora, tagsets and treebank data etc.
Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35)
[GCC 4.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
The best place to start is the NLTK book Natural Language Processing with Python Analyzing Text with the Natural Language Toolkit. The book is released under public domain, so you can read it online on NLTK website itself. I would recommand you to buy a copy of this book as the procceds will go into the future development of NLTK.
There aren't many videos about NLTK. I recently stumbled upon this video lecture by the trinity of NLTK Steven Bird, Ewan Klein, and Edward Loper.
If you are new to computational linguistics and need good grounding in this field you should also consider reading these texts.