Yesterday, a customer of mine emailed me that the Python program I had written for him didn't work. The last line of the traceback reported:
HTMLParser.HTMLParseError: malformed start tag, at line 322, column 19
Since I had read Having
problems with Beautiful Soup 3.1.0? the day
before I got his email and knew that he was
running the Python program I wrote on a more
recent version of Ubuntu (9.04) than I use to
develop (8.10) there was no doubt that this was
caused by a more recent version of BeautifulSoup
which uses HTMLParser
instead of
SGMLParser
. The reason for this
parser change is that SGMLParser
is
no longer part of the standard library as of
Python 3.0.
Some time ago I had explained to him how to install BeautifulSoup on Ubuntu:
sudo apt-get install python-beautifulsoup
Which on Ubuntu 8.10 installs version 3.0.7. However, on Ubuntu 9.04 this installs version 3.1.0.1, which might break more than the Python program I wrote for my customer...
Even though the last line of the traceback made clear that
HTMLParser
was used implying that BeautifulSoup
more recent than 3.0.7a was used, I asked my customer to verify
the version of BeautifulSoup as follows:
Enter python
in a shell and at the prompt type the
following two lines:
import BeautifulSoup
print BeautifulSoup.__version__
On my system, running Ubuntu 8.10 with BeautifulSoup installed via
apt-get
this looked like:
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import BeautifulSoup
>>> print BeautifulSoup.__version__
3.0.7
>>>
Note: you can leave the Python interactive mode by pressing Ctrl+D.
Since my customer had installed BeautifulSoup
using apt-get
removing this module
was as simple as:
sudo apt-get remove python-beautifulsoup
If you installed BeautifulSoup
using
easy_install
you can use
easy_install -m BeautifulSoup
and
delete the .egg files or directories, along with
any scripts. Yes, you're reading that correctly,
and probably are as amazed as I am at this: in order to
uninstall you have to manually
delete files and directories, see
Uninstalling Packages.
easy_install
After I had explained my customer how to check the
version of BeautifulSoup and remove the old
version, in the same email I explained how to
install 3.0.7a, the latest version that still uses
SGMLParser
per the problems
with Beautiful Soup page, as follows:
sudo easy_install -U "BeautifulSoup==3.0.7a"
And after he had followed all those steps, the Python program I had written worked as expected.