Need a web scraping or data wrangling solution? Check out my resume (PDF).
John Bokma's Hacking & Hiking

Finding duplicate files with jdupes

May 21, 2019

For the past years I have used fdupes by Adrian Lopez to find and delete duplicate files on my Linux machines virtual or actual. It's a great program and I love the interactive delete mode. Then I learnt that Jody Bruchon had written a much faster version, called jdupes; in which the j most likely stands for Jody, certainly not Java.

Installing jdupes on a recent version of Ubuntu

If you have already installed fdupes I recommend to remove it from your Ubuntu system as follows:

sudo apt purge -y fdupes
sudo apt autoremove

Next, install jdupes as follows:

sudo apt install -y jdupes

At the time of writing this installs jdupes version 1.11.1 (2018-11-09) on the latest version of Ubuntu; 19.04.

Installing jdupes from source on Ubuntu

If you have an older version of Ubuntu or you just want to be sure you have the latest version of jdupes you have to build the program from source.

The following instructions can be used to clone and build the latest version of jdupes. At the time of writing this was version 1.12 (2019-02-18).

First, install make, gcc, and git as follows:

sudo apt install -y git make gcc

Next, clone the jdupes repository as follows:

git clone https://github.com/jbruchon/jdupes

Change into the jdupes directory and run make:

cd jdupes/
make

If all goes well, install jdupes as follows:

sudo make install

Finally, move out of the jdupes directory and remove it and its contents to clean up:

cd ..
rm -rf jdupes/

Example jdupes usage

Most of the time I use the recursive interactive mode of jdupes, for example:

jdupes -rd Public/GoodReader/ Downloads/web/

I like to read downloaded PDFs with the GoodReader PDF reader app on iOS. Now and then I download the same file again. So I use jdupes to find duplicates.

Example output:

Scanning: 1091 files, 107 items (in 2 specified)
[1] Downloads/web/Haskell_Programming_(EARLY_ACCESS)/haskell-programming-1.0RC4-
screen.pdf
[2] Public/GoodReader/Computer/Haskell/Haskell Programming - From First Principl
es (1.0RC4, screen, 2019).pdf

Set 1 of 1: keep which files? (1 - 2, [a]ll, [n]one): 

As you can see interactive mode makes it very easy to control what to do with the duplicate file(s).

See also