Talking About Technology "World As Built"


NLTK Installation with Python easy_install

Few weeks ago I wrote the NLTK on Ubuntu Quick Start Guide. Now with the release of NLTK (Natural Language Toolkit) 2.0b5 today the NLTK installation has been greatly simplified thanks to the nltk python egg (See Changelog).

To get started with NLTK install, you first need the python-setuptools package. $ sudo apt-get install python-setuptools Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: python-setuptools 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 195kB of archives. After this operation, 909kB of additional disk space will be used. Get:1 http://in.archive.ubuntu.com karmic/main python-setuptools 0.6c9-0ubuntu4 [195kB] Fetched 195kB in 9s (20.2kB/s) Selecting previously deselected package python-setuptools. (Reading database ... 106971 files and directories currently installed.) Unpacking python-setuptools (from .../python-setuptools_0.6c9-0ubuntu4_all.deb) ... Setting up python-setuptools (0.6c9-0ubuntu4) ...

Now lets install the NLTK with easy_install program. $ sudo easy_install http://nltk.googlecode.com/files/nltk-2.0b5-py2.6.egg Downloading http://nltk.googlecode.com/files/nltk-2.0b5-py2.6.egg Processing nltk-2.0b5-py2.6.egg creating /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg Extracting nltk-2.0b5-py2.6.egg to /usr/local/lib/python2.6/dist-packages Adding nltk 2.0b5 to easy-install.pth file Installed /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg Processing dependencies for nltk==2.0b5 Searching for PyYAML==3.08 Reading http://pypi.python.org/simple/PyYAML/ Reading http://pyyaml.org/wiki/PyYAML Best match: PyYAML 3.08 Downloading http://pyyaml.org/download/pyyaml/PyYAML-3.08.zip Processing PyYAML-3.08.zip Running PyYAML-3.08/setup.py -q bdist_egg --dist-dir /tmp/easy_install-T7Y0La/PyYAML-3.08/egg-dist-tmp-vRjvDM build/temp.linux-i686-2.6/check_libyaml.c:2:18: error: yaml.h: No such file or directory build/temp.linux-i686-2.6/check_libyaml.c: In function ‘main’: build/temp.linux-i686-2.6/check_libyaml.c:5: error: ‘yaml_parser_t’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:5: error: (Each undeclared identifier is reported only once build/temp.linux-i686-2.6/check_libyaml.c:5: error: for each function it appears in.) build/temp.linux-i686-2.6/check_libyaml.c:5: error: expected ‘;’ before ‘parser’ build/temp.linux-i686-2.6/check_libyaml.c:6: error: ‘yaml_emitter_t’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:6: error: expected ‘;’ before ‘emitter’ build/temp.linux-i686-2.6/check_libyaml.c:8: warning: implicit declaration of function ‘yaml_parser_initialize’ build/temp.linux-i686-2.6/check_libyaml.c:8: error: ‘parser’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:9: warning: implicit declaration of function ‘yaml_parser_delete’ build/temp.linux-i686-2.6/check_libyaml.c:11: warning: implicit declaration of function ‘yaml_emitter_initialize’ build/temp.linux-i686-2.6/check_libyaml.c:11: error: ‘emitter’ undeclared (first use in this function) build/temp.linux-i686-2.6/check_libyaml.c:12: warning: implicit declaration of function ‘yaml_emitter_delete’ libyaml is not found or a compiler error: forcing --without-libyaml (if libyaml is installed correctly, you may need to specify the option --include-dirs or uncomment and modify the parameter include_dirs in setup.cfg) zip_safe flag not set; analyzing archive contents... Adding PyYAML 3.08 to easy-install.pth file Installed /usr/local/lib/python2.6/dist-packages/PyYAML-3.08-py2.6-linux-i686.egg Finished processing dependencies for nltk==2.0b5

Now you done, import the NLTK and start downloading the NTLK data. $ python Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35) [GCC 4.4.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> nltk.download() NLTK Downloader --------------------------------------------------------------------------- d) Download l) List c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> l Packages: /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg/nltk/__init__.py:588: DeprecationWarning: object.__new__() takes no parameters [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy) [ ] abc................. Australian Broadcasting Commission 2006 [ ] brown............... Brown Corpus [ ] alpino.............. Alpino Dutch Treebank [ ] cess_cat............ CESS-CAT Treebank [ ] brown_tei........... Brown Corpus (TEI XML Version) [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6) [ ] biocreative_ppi..... BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology) [ ] cess_esp............ CESS-ESP Treebank [ ] chat80.............. Chat-80 Data Files [ ] city_database....... City Database [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus [ ] conll2000........... CONLL 2000 Chunking Corpus [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan and Basque Subset) [ ] dependency_treebank. Dependency Parsed Treebank [ ] floresta............ Portuguese Treebank [ ] genesis............. Genesis Corpus [ ] gazetteers.......... Gazeteer Lists Hit Enter to continue:

1 comments: (Track with co.mments)

Christan Grant said...

This may be obvious, but it is helpful to note that before installing the nltk you should also install pyYaml

sudo apt-get install python-yaml

It will get rid of some of those ugly error messages during the unstall.