Wednesday, July 8, 2009

NLTK on Ubuntu Quick Start Guide

Update July 19, 2009 : You can now use nltk python egg instead, read the NLTK Installation with Python setuptools post.

While attending a short program in computational linguistics at Dravidian University, Dr. Arul introduced me to NLTK (Natural Language Toolkit). It was full two years before that I finally decided to have a close look at it. Like most linguists at the lab I used Perl programming language. With new version of NLTK 2.0 released last month, NLTK now works with python 2.6. Here a quick start guide for NLTK on Ubuntu Linux.

Installing NLTK on Ubuntu with Python 2.6

At the time of writing this post the Debian package on NLTK download page is built for Python 2.5. Ubuntu ships with Python 2.6 by default. So you need to download the source package from the NLTK download page.

NLTK needs some dependency modules, lets install them.
sudo apt-get install python-numpy python-matplotlib prover9

Uncompress the source package and run the NLTK setup.
$ unzip
$ cd nltk-2.0b3/
$ ls build LICENSE.txt nltk PKG-INFO README.txt yaml
$ sudo python install

After finishing the NLTK setup, you should download the NLTK data which contains various corpora, tagsets and treebank data etc.
$ python
Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35)
[GCC 4.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
NLTK Data downloader window

Learning NLTK

NLTK Book coverThe best place to start is the NLTK book Natural Language Processing with Python Analyzing Text with the Natural Language Toolkit. The book is released under public domain, so you can read it online on NLTK website itself. I would recommand you to buy a copy of this book as the procceds will go into the future development of NLTK.

There aren't many videos about NLTK. I recently stumbled upon this video lecture by the trinity of NLTK Steven Bird, Ewan Klein, and Edward Loper.

If you are new to computational linguistics and need good grounding in this field you should also consider reading these texts.

Speech and Language Processing (2nd Edition) book coverSpeech and Language Processing (2nd Edition)

Natural Language Understanding (2nd Edition) book cover Natural Language Understanding (2nd Edition)

Foundations of Statistical Natural Language Processing book cover Foundations of Statistical Natural Language Processing


  1. Thanks for the post!

    This very nearly worked just right for me, but I also had to install setuptools and the Python yaml library, with "sudo apt-get install python-setuptools python-yaml".

  2. I was at that BayPiggies talk! By chance I happened to be in Palo Alto when it was going on (I live in the east of Canada). It was a great talk, with some good discussion.

    Thanks for pointing out the easy_install...worked like a charm!

  3. Thanks for posting this. It was very helpful.


You can leave a comment here using your Google account, OpenID or as an anonymous user.

Popular Posts