John Bokma's Hacking & Hiking

In-memory compression and decompression of JSON data

June 29, 2017

Today, for a project I am working on I wanted to compress and decompress JSON data in-memory. Yesterday, in the evening, I had already read the documentation of IO::Compress::Gzip, a Perl module I had used before in the past, as a refresher. Based on the examples given in the documentation was quite easy to do in-memory compression.

The module I am working on caches JSON data returned by a web API. While the data itself is not much, just a few hundred kilobytes, it adds up when there are thousands and thousands of such relatively small files. Instead of compressing data to an already opened filehandle I decided to compress data to an in-memory buffer and write the buffer to a file instead. Below follows a simplified example program that shows how this can be done in your own code.

First, it shows how to write a Perl data structure to a file as JSON data uncompressed. Note that the data is encoded as octets and written as octets using the spew method. Next the data is read back as octets and decoded back from octets to a Perl data structure. The two smiley faces are displayed correctly.

#!/usr/bin/perl

use strict;
use warnings;

use JSON::XS;
use Path::Tiny;
use IO::Compress::Gzip     qw( gzip   $GzipError   );
use IO::Uncompress::Gunzip qw( gunzip $GunzipError );
use Data::Dumper;

binmode STDOUT, ":encoding(UTF-8)";

my $data_out = {
    unicode => "\x{263a}\x{263b}",                  # two smiley faces
    message => 'Hello, world! ' x 3,
};

# Uncompressed
{
    my $uncompressed = 'uncompressed.json';
    my $encoded = encode_json( $data_out );         # encode to octets
    path( $uncompressed )->spew( $encoded );        # write out octets

    my $buffer = path( $uncompressed )->slurp();    # read octets
    my $data_in = decode_json( $buffer );           # decode from octets

    print Dumper $data_in;
    print $data_in->{ unicode } . " <-- must show two smiley faces\n";
}

# Compressed
{
    my $compressed = 'compressed.json.gz';
    my $encoded = encode_json( $data_out );         # encode to octets
    my $buffer;
    gzip \$encoded, \$buffer, Level => 9            # gzip to buffer (octets)
        or die "Can't gzip data: $GzipError";     
    path( $compressed )->spew( $buffer );           # buffer to file (octets)

    $buffer = path( $compressed )->slurp();         # read octets
    my $encoded;
    gunzip \$buffer, \$encoded                      # gunzip to encoded (octets)
        or die "Can't gunzip data: $GunzipError";
    my $data_in = decode_json( $encoded );          # decode from octets

    print Dumper $data_in;
    print $data_in->{ unicode } . " <-- must show two smiley faces\n";
}

The second part does the same as the first part, but now compresses the encoded data before it's written to a file and uncompresses the data when it's read back.

Additional verification can be done using the file command:

$ file uncompressed.json 
uncompressed.json: UTF-8 Unicode text, with no line terminators
$ gzip -dc compressed.json.gz | file -
/dev/stdin: UTF-8 Unicode text, with no line terminators

Note that gzip dc decompresses the given file to stdout and that file - makes the file command read from stdin. The pipe glues those two commands together, showing the expected output.

Another option is to use jq; a program its existence I just learnt about yesterday evening:

jq .unicode uncompressed.json 
"☺☻"

The above command shows just the value of the unicode field.

gzip -dc compressed.json.gz | jq .unicode
"☺☻"

The above command decompresses the data and pipes it to jq, which once more just shows the value of the unicode field.

In-memory compression and decompression of JSON data

Related