Perl programmer for hire: download my resume (PDF).
John Bokma MexIT
freelance Perl programmer

Mobile Wikipedia

Saturday, November 25, 2006 | 0 comments

Yesterday I read an article on Torrentfreak about a Wikipedia compilation CD that had been made available for download as a torrent. I was hoping that I could use this compilation on my PDA, a Dell Axim X51v, so I started my favorite BitTorrent client: µTorrent (mu-torrent or micro torrent) and downloaded the zipped "Wikipedia CD". The article on Torrentfreak stated that the compilation, over 2500 hand-picked and edited educational articles from the Wikipedia, has been created by volunteers of SOS Children and is aimed at helping schools enhance their curriculum and children learn.

After the download of wpcd.zip via µTorrent had finished I unpacked the zip file by opening it via Explorer, selecting all files and folders inside the zip, and dragging them "outside" the zip file and dropping them into the folder containing the wpcd.zip, and thus using the unzipping capabilities of Windows XP. That was quite a mistake because it took ages to unpack everything this way (I guess about 10 minutes), and the hard disk made quite some noise, making clear to me that this unpacking operation was not running in a very effective way.

When the unzipping was finally done I connected the Dell Axim X51v to my computer, and used Microsoft ActiveSync to copy the wpcd folder (about 240 MB unpacked) to the 1G Panasonic SD memory card I had added to the PDA some time ago. After some copying suddenly an error dialog window popped up with the title: "Error Copying File". The reason given was: "Cannot copy Thumbs: Access is denied.", followed by: "Make sure your mobile device has sufficient memory, that the file is not set as read-only, and that you have permission to copy this file type.". I was sure that this issue was caused by a Thumbs.db file, a file created by Windows XP for the caching of thumbnail versions of images in a folder for faster drawing when the folder contents is viewed as thumbnails, but I checked the amount of free space on the memory card anyway. And there was more then enough free space available. Since it was already very late, I decided to look further into it the next day.

Error copying file - Cannot copy Thumbs: Access is denied.
Error copying file - Cannot copy Thumbs: Access is denied.

So today to get a fresh start I deleted the wpcd folder from my computer, and the partially copied wpcd folder from my Dell Axim. Next I unpacked the wpcd.zip again, this time using the excellent free 7-Zip file archive utility. And this time the unpacking took only one minute and 31 seconds, and the hard disk was very silent while 7-Zip did its work.

Then I wrote a small Perl program. My idea was to locate all hidden Thumbs.db files and generate a batch (bat) file to delete all those thumbnail files and put this bat file on this site for download. The Perl program uses the File::Find Perl module. The find function calls the thumbs sub for each object found. The thumbs sub checks if the name of this object, made lower case since Windows XP uses a case-insensitive file system, equals thumbs.db. If not, then there is nothing to do and the sub returns. Otherwise it copies the name and the path of the object (including wpcd) into the variable $filename and replaces all forward slashes with backward slashes and prints the result with the del command in front on a line of its own.

use strict;
use warnings;

use File::Find;

find( \&thumbs, 'wpcd' );
exit;


sub thumbs {

    lc $_ eq 'thumbs.db' or return;

    ( my $filename = $File::Find::name ) =~ s{/}{\\}g;
    print "del $filename\n";
}

I ran the above Perl program and redirected the output to delthumbs.bat, resulting in 59 lines, each containing a delete instruction. However, when I executed the bat program I got 59 errors stating that for each Thumbs.db file I wanted to delete the actual Thumbs.db file couldn't be found. So I checked the command line options for the del command since I was guessing that adding an option might help:

E:\usr\john\Downloads\finished-torrents>del /?
Deletes one or more files.

DEL [/P] [/F] [/S] [/Q] [/A[[:]attributes]] names
ERASE [/P] [/F] [/S] [/Q] [/A[[:]attributes]] names

  names         Specifies a list of one or more files or directories.
                Wildcards may be used to delete multiple files. If a
                directory is specified, all files within the directory
                will be deleted.

  /P            Prompts for confirmation before deleting each file.
  /F            Force deleting of read-only files.
  /S            Delete specified files from all subdirectories.
  /Q            Quiet mode, do not ask if ok to delete on global wildcard
  /A            Selects files to delete based on attributes
  attributes    R  Read-only files            S  System files
                H  Hidden files               A  Files ready for archiving
                -  Prefix meaning not

If Command Extensions are enabled DEL and ERASE change as follows:

The display semantics of the /S switch are reversed in that it shows
you only the files that are deleted, not the ones it could not find.

And after carefully reading and some testing, I came up with a one liner that could be executed from the command line in the wpcd folder and would do the job without using Perl: del /S /F /A R Thumbs.db

So I changed the working folder at the command line to wpcd using cd wpcd and then executed the above command, and was greeted with "Deleted file ..." for each Thumbs.db file, 59 lines in total.

Mobile Wikipedia screenshots.
Mobile Wikipedia screenshots.

After this operation I used again Microsoft ActiveSync to copy the wpcd folder to the Panasonic 1 GB SD card inside my Dell Axim, and this time not a single error showed up, and when the copying was finally done after 27 minutes I had my Mobile Wikipedia.

Edit: in the evening I discovered that Microsoft Internet Explorer Mobile has problems with filenames that have "URL encoded" characters like (, ), and '.

Related

Also today

Please post a comment | read 0 comments | RSS feed