In-memory compression and decompression of JSON data
June 29, 2017
Today, for a project I am working on I wanted to compress and
decompress JSON data in-memory. Yesterday, in the evening, I had
already read the documentation of IO::Compress::Gzip
, a Perl module
I had used before in the past, as a refresher. Based on the examples
given in the documentation was quite easy to do in-memory compression.
The module I am working on caches JSON data returned by a web API. While the data itself is not much, just a few hundred kilobytes, it adds up when there are thousands and thousands of such relatively small files. Instead of compressing data to an already opened filehandle I decided to compress data to an in-memory buffer and write the buffer to a file instead. Below follows a simplified example program that shows how this can be done in your own code.
First, it shows how to write a Perl data structure to a file as JSON data
uncompressed. Note that the data is encoded as octets and written as octets
using the spew
method. Next the data is read back as octets and decoded
back from octets to a Perl data structure. The two smiley faces are displayed
correctly.
#!/usr/bin/perl
use strict;
use warnings;
use JSON::XS;
use Path::Tiny;
use IO::Compress::Gzip qw( gzip $GzipError );
use IO::Uncompress::Gunzip qw( gunzip $GunzipError );
use Data::Dumper;
binmode STDOUT, ":encoding(UTF-8)";
my $data_out = {
unicode => "\x{263a}\x{263b}", # two smiley faces
message => 'Hello, world! ' x 3,
};
# Uncompressed
{
my $uncompressed = 'uncompressed.json';
my $encoded = encode_json( $data_out ); # encode to octets
path( $uncompressed )->spew( $encoded ); # write out octets
my $buffer = path( $uncompressed )->slurp(); # read octets
my $data_in = decode_json( $buffer ); # decode from octets
print Dumper $data_in;
print $data_in->{ unicode } . " <-- must show two smiley faces\n";
}
# Compressed
{
my $compressed = 'compressed.json.gz';
my $encoded = encode_json( $data_out ); # encode to octets
my $buffer;
gzip \$encoded, \$buffer, Level => 9 # gzip to buffer (octets)
or die "Can't gzip data: $GzipError";
path( $compressed )->spew( $buffer ); # buffer to file (octets)
$buffer = path( $compressed )->slurp(); # read octets
my $encoded;
gunzip \$buffer, \$encoded # gunzip to encoded (octets)
or die "Can't gunzip data: $GunzipError";
my $data_in = decode_json( $encoded ); # decode from octets
print Dumper $data_in;
print $data_in->{ unicode } . " <-- must show two smiley faces\n";
}
The second part does the same as the first part, but now compresses the encoded data before it's written to a file and uncompresses the data when it's read back.
Additional verification can be done using the file
command:
$ file uncompressed.json
uncompressed.json: UTF-8 Unicode text, with no line terminators
$ gzip -dc compressed.json.gz | file -
/dev/stdin: UTF-8 Unicode text, with no line terminators
Note that gzip dc
decompresses the given file to stdout
and that file -
makes the file
command read from stdin
. The pipe glues those two commands
together, showing the expected output.
Another option is to use jq
; a program its existence I just learnt about yesterday evening:
jq .unicode uncompressed.json
"☺☻"
The above command shows just the value of the unicode
field.
gzip -dc compressed.json.gz | jq .unicode
"☺☻"
The above command decompresses the data and pipes it to jq
, which once more
just shows the value of the unicode
field.
Related
- IO::Compress::Gzip documentation
- IO::Uncompress::Gunzip documentation
- jq - jq is like
sed
for JSON data.