John Bokma Perl

Google Suggest program

Example program | 3 comments

Google Suggest beta is a new toy available from the famous Google labs. As soon as you start to type something Google Suggest recognizes in the search field, a drop down menu appears under this field with several suggestions followed by the number of results available. Also, what you type is completed with what Google considers the best suggestion. The part added by Google is selected so when you keep typing it is deleted by what you type and hence a new suggestion (if available) will be given. You can also navigate the drop down list using the arrow keys to select a suggestion made by Google.

Below a small Perl program is given that is able to fetch a suggestion list for a given query. The result returned by Google is a small piece of JavaScript code. A simple method is used to extract the actual information and to report the results. The first half of the report shows the suggestions in order as given by Google Suggest and the second half shows the suggestions sorted by number of search results available.

Since the number of results have the string " results" (or "result") appended, this string is removed. When the number of results is not known, the result is returned as just   by Google and replaced by an empty string by the Perl program. Note that this means that trailing null fields can occur and hence a negative limit must be specified in the split function to prevent undefined results.

# gsuggest.pl - Google suggest
#
# © Copyright, 2004-2005 By John Bokma, http://johnbokma.com/
#
# $Id$ 

use strict;
use warnings;

use URI::Escape;
use LWP::UserAgent;


unless ( @ARGV ) {

    print "usage: gsuggest.pl query\n";
    exit( 1 );
}

my $url = 'http://www.google.com/complete/search?js=true&qu=' .
    uri_escape( join ' ' => @ARGV );

my $ua = LWP::UserAgent->new( agent => 'Mozilla/5.0' );
my $response = $ua->get( $url );

$response->is_success or
    die "$url: ", $response->status_line;

my $content = $response->content;

# extract the information from the JavaScript file response
# note that the first " and the last " in the Arrays are
# excluded (if data is present)
my ( $element, $array1, $array2 ) = $content =~
    /"(.+?)".+?Array\("(.+?)"\).+?Array\("(.+?)"\)/;

unless ( defined $array1 ) {

    print "No results\n";
    exit;
}

# split the first "array" on the item separator (the very first "
# and the very last " are already removed)
my @suggestions = split /", "/, $array1;

# remove the result(s) string from the number of results
# remove   if present (no results known)
# and split the second "array"
# note that a negative limit is used to catch trailing empty
# results (caused by removing  )
$array2 =~ s/ results?//g;
$array2 =~ s/ //g;
my @results = split /", "/, $array2, -1;

# make suggestion => result(s) pairs.
# note that the number of results is turned into a right justified string
my @pairs = map { [ sprintf ( "%12s", shift @results ) => $_ ] } @suggestions;

# print the pairs in suggested order
print "@$_\n" for @pairs;

# print the pairs sorted on the number of results for each suggestion,
# largest "number" first since the numbers are right justified strings.
print "\nsorted:\n";
print "@$_\n" for sort { $b->[0] cmp $a->[0] } @pairs;

Example of program usage and output:

gsuggest.pl script
  24,900,000 scripts
  59,800,000 script
   2,880,000 script fonts
   8,110,000 script writing
     545,000 scripture union
     103,000 scripting.filesystemobject
      58,200 script o rama
   6,320,000 scripture
  11,700,000 scripting
   6,300,000 script font

sorted:
  59,800,000 script
  24,900,000 scripts
  11,700,000 scripting
   8,110,000 script writing
   6,320,000 scripture
   6,300,000 script font
   2,880,000 script fonts
     545,000 scripture union
     103,000 scripting.filesystemobject
      58,200 script o rama

Google suggest related links

Please post a comment | read 3 comments, latest by Mike | RSS feed