Sunday, July 19, 2009

NLTK Installation with Python easy_install

Few weeks ago I wrote the NLTK on Ubuntu Quick Start Guide. Now with the release of NLTK (Natural Language Toolkit) 2.0b5 today the NLTK installation has been greatly simplified thanks to the nltk python egg (See Changelog).

To get started with NLTK install, you first need the python-setuptools package.

$ sudo apt-get install python-setuptools 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 195kB of archives.
After this operation, 909kB of additional disk space will be used.
Get:1 karmic/main python-setuptools 0.6c9-0ubuntu4 [195kB]
Fetched 195kB in 9s (20.2kB/s)                                                 
Selecting previously deselected package python-setuptools.
(Reading database ... 106971 files and directories currently installed.)
Unpacking python-setuptools (from .../python-setuptools_0.6c9-0ubuntu4_all.deb) ...
Setting up python-setuptools (0.6c9-0ubuntu4) ...

Now lets install the NLTK with easy_install program. $ sudo easy_install Downloading Processing nltk-2.0b5-py2.6.egg creating /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg Extracting nltk-2.0b5-py2.6.egg to /usr/local/lib/python2.6/dist-packages Adding nltk 2.0b5 to easy-install.pth file Installed /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg Processing dependencies for nltk==2.0b5 Searching for PyYAML==3.08 Reading Reading Best match: PyYAML 3.08 Downloading Processing Running PyYAML-3.08/ -q bdist_egg --dist-dir /tmp/easy_install-T7Y0La/PyYAML-3.08/egg-dist-tmp-vRjvDM build/temp.linux-i686-2.6/check_libyaml.c:2:18: error: yaml.h: No such file or directory build/temp.linux-i686-2.6/check_libyaml.c: In function ‘main’: build/temp.linux-i686-2.6/check_libyaml.c:5: error: ‘yaml_parser_t’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:5: error: (Each undeclared identifier is reported only once build/temp.linux-i686-2.6/check_libyaml.c:5: error: for each function it appears in.) build/temp.linux-i686-2.6/check_libyaml.c:5: error: expected ‘;’ before ‘parser’ build/temp.linux-i686-2.6/check_libyaml.c:6: error: ‘yaml_emitter_t’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:6: error: expected ‘;’ before ‘emitter’ build/temp.linux-i686-2.6/check_libyaml.c:8: warning: implicit declaration of function ‘yaml_parser_initialize’ build/temp.linux-i686-2.6/check_libyaml.c:8: error: ‘parser’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:9: warning: implicit declaration of function ‘yaml_parser_delete’ build/temp.linux-i686-2.6/check_libyaml.c:11: warning: implicit declaration of function ‘yaml_emitter_initialize’ build/temp.linux-i686-2.6/check_libyaml.c:11: error: ‘emitter’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:12: warning: implicit declaration of function ‘yaml_emitter_delete’ libyaml is not found or a compiler error: forcing --without-libyaml (if libyaml is installed correctly, you may need to specify the option --include-dirs or uncomment and modify the parameter include_dirs in setup.cfg) zip_safe flag not set; analyzing archive contents... Adding PyYAML 3.08 to easy-install.pth file Installed /usr/local/lib/python2.6/dist-packages/PyYAML-3.08-py2.6-linux-i686.egg Finished processing dependencies for nltk==2.0b5

Now you done, import the NLTK and start downloading the NTLK data. $ python Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35) [GCC 4.4.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> NLTK Downloader --------------------------------------------------------------------------- d) Download l) List c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> l Packages: /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg/nltk/ DeprecationWarning: object.__new__() takes no parameters [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy) [ ] abc................. Australian Broadcasting Commission 2006 [ ] brown............... Brown Corpus [ ] alpino.............. Alpino Dutch Treebank [ ] cess_cat............ CESS-CAT Treebank [ ] brown_tei........... Brown Corpus (TEI XML Version) [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6) [ ] biocreative_ppi..... BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology) [ ] cess_esp............ CESS-ESP Treebank [ ] chat80.............. Chat-80 Data Files [ ] city_database....... City Database [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus [ ] conll2000........... CONLL 2000 Chunking Corpus [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan and Basque Subset) [ ] dependency_treebank. Dependency Parsed Treebank [ ] floresta............ Portuguese Treebank [ ] genesis............. Genesis Corpus [ ] gazetteers.......... Gazeteer Lists Hit Enter to continue:


  1. This may be obvious, but it is helpful to note that before installing the nltk you should also install pyYaml

    sudo apt-get install python-yaml

    It will get rid of some of those ugly error messages during the unstall.

  2. Thanks a lot, Arky.

    Very useful recipe.

    Only note that you can check the current .egg version in the following url:


  3. Thanks for posting this, it was very helpful to me! - wwward

  4. is this for windows too? MY computer has to pass through network proxy.

  5. yes, NLTK is available for MS windows too. It won't have any problems with network proxy.

  6. I cannot download the data file because of the proxy. is this package (in this blog) can be used for windows?

  7. The NLTK Data page mentions how to use NLTK with proxy settings.

    If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy. (NB this support was added on 21 Sep 2010, and needs a release more recent than 2.0b9.)

    >>>nltk.set_proxy('' ('USERNAME', 'PASSWORD'))


  8. thanks helped and saved me ample amount of time :)

  9. thank u so much..i was struggling with nltk installation bfre finding this:)

  10. Glad to it worked for you. Good luck Arrpita.

  11. Unable to install using easy_install:
    error: Download error for [Errno 110] Connection timed out"
    But I can download it through my browser. I'm usuing ubuntu 10.04. I had set http_proxy correctly.

  12. sudo easy_install
    Processing nltk-2.0b5-py2.6.egg
    removing '/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.6.egg' (and everything under it)
    creating /usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.6.egg
    Extracting nltk-2.0b5-py2.6.egg to /usr/local/lib/python2.7/dist-packages
    nltk 2.0b5 is already the active version in easy-install.pth

    Installed /usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.6.egg
    Processing dependencies for nltk==2.0b5
    Searching for PyYAML==3.08
    No local packages or download links found for PyYAML==3.08
    error: Could not find suitable distribution for Requirement.parse('PyYAML==3.08')

    1. Please post this to NLTK forums. This blog post dates back to 2009, a lot must have changed over the years.


You can leave a comment here using your Google account, OpenID or as an anonymous user.

Popular Posts