Perl programmer for hire: download my resume (PDF).
John Bokma MexIT
freelance Perl programmer

Python: downgrading BeautifulSoup

Saturday, September 26, 2009 | 0 comments

Yesterday, a customer of mine emailed me that the Python program I had written for him didn't work. The last line of the traceback reported:

HTMLParser.HTMLParseError: malformed start tag, at line 322, column 19

Since I had read Having problems with Beautiful Soup 3.1.0? the day before I got his email and knew that he was running the Python program I wrote on a more recent version of Ubuntu (9.04) than I use to develop (8.10) there was no doubt that this was caused by a more recent version of BeautifulSoup which uses HTMLParser instead of SGMLParser. The reason for this parser change is that SGMLParser is no longer part of the standard library as of Python 3.0.

Some time ago I had explained to him how to install BeautifulSoup on Ubuntu:

sudo apt-get install python-beautifulsoup

Which on Ubuntu 8.10 installs version 3.0.7. However, on Ubuntu 9.04 this installs version 3.1.0.1, which might break more than the Python program I wrote for my customer...

How to verify the version of BeautifulSoup installed

Even though the last line of the traceback made clear that HTMLParser was used implying that BeautifulSoup more recent than 3.0.7a was used, I asked my customer to verify the version of BeautifulSoup as follows:

Enter python in a shell and at the prompt type the following two lines:

import BeautifulSoup
print BeautifulSoup.__version__

On my system, running Ubuntu 8.10 with BeautifulSoup installed via apt-get this looked like:

Python 2.5.2 (r252:60911, Oct  5 2008, 19:29:17)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import BeautifulSoup
>>> print BeautifulSoup.__version__
3.0.7
>>> 

Note: you can leave the Python interactive mode by pressing Ctrl+D.

Removing BeautifulSoup

Since my customer had installed BeautifulSoup using apt-get removing this module was as simple as:

sudo apt-get remove python-beautifulsoup

If you installed BeautifulSoup using easy_install you can use easy_install -m BeautifulSoup and delete the .egg files or directories, along with any scripts. Yes, you're reading that correctly, and probably are as amazed as I am at this: in order to uninstall you have to manually delete files and directories, see Uninstalling Packages.

Installing BeautifulSoup 3.0.7a using easy_install

After I had explained my customer how to check the version of BeautifulSoup and remove the old version, in the same email I explained how to install 3.0.7a, the latest version that still uses SGMLParser per the problems with Beautiful Soup page, as follows:

sudo easy_install -U "BeautifulSoup==3.0.7a"

And after he had followed all those steps, the Python program I had written worked as expected.

Also today

Please post a comment | read 0 comments | RSS feed