
Yesterday, a customer of mine emailed me that the Python program I had written for him didn't work. The last line of the traceback reported:
HTMLParser.HTMLParseError: malformed start tag, at line 322, column 19
Since I had read Having
problems with Beautiful Soup 3.1.0? the day
before I got his email and knew that he was
running the Python program I wrote on a more
recent version of Ubuntu (9.04) than I use to
develop (8.10) there was no doubt that this was
caused by a more recent version of BeautifulSoup
which uses HTMLParser instead of
SGMLParser. The reason for this
parser change is that SGMLParser is
no longer part of the standard library as of
Python 3.0.
Some time ago I had explained to him how to install BeautifulSoup on Ubuntu:
sudo apt-get install python-beautifulsoup
Which on Ubuntu 8.10 installs version 3.0.7. However, on Ubuntu 9.04 this installs version 3.1.0.1, which might break more than the Python program I wrote for my customer...
Even though the last line of the traceback made clear that
HTMLParser was used implying that BeautifulSoup
more recent than 3.0.7a was used, I asked my customer to verify
the version of BeautifulSoup as follows:
Enter python in a shell and at the prompt type the
following two lines:
import BeautifulSoup
print BeautifulSoup.__version__
On my system, running Ubuntu 8.10 with BeautifulSoup installed via
apt-get this looked like:
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import BeautifulSoup
>>> print BeautifulSoup.__version__
3.0.7
>>>
Note: you can leave the Python interactive mode by pressing Ctrl+D.
Since my customer had installed BeautifulSoup
using apt-get removing this module
was as simple as:
sudo apt-get remove python-beautifulsoup
If you installed BeautifulSoup using
easy_install you can use
easy_install -m BeautifulSoup and
delete the .egg files or directories, along with
any scripts. Yes, you're reading that correctly,
and probably are as amazed as I am at this: in order to
uninstall you have to manually
delete files and directories, see
Uninstalling Packages.
easy_install
After I had explained my customer how to check the
version of BeautifulSoup and remove the old
version, in the same email I explained how to
install 3.0.7a, the latest version that still uses
SGMLParser per the problems
with Beautiful Soup page, as follows:
sudo easy_install -U "BeautifulSoup==3.0.7a"
And after he had followed all those steps, the Python program I had written worked as expected.