Glad to hear...

Story: How to Sort and Remove Duplicate Photos in Linux Total Replies: 2
Author Content
rnturn

May 25, 2014
2:18 AM EDT
... I'm not the only person who has multiple copies of photos in copied into directories like 20140523, 20140524, etc. because I failed to reformat the card after the last time I dragged all the photos onto my hard disk. I was unaware that a couple of these utilities for finding dupes existed. (Might have saved me from having written several similar utilities myself.)

Next problem: dealing with photos that are being offloaded from several cameras and have similar naming conventions. One of these days I'm going to be offloading photos from two cameras and find that there are duplicate filenames for different photos. Something based on exiftool to rename the photos as they're copied from the memory cards that uses the camera brand and model in the filename would be the ticket.
gus3

May 25, 2014
4:10 PM EDT
md5sum is your friend:

$ find /path/to/photos/ -type f | wc -l

$ find /path/to/photos/ -type f | xargs md5sum | awk '{print $1;}' | sort | uniq | wc -l

If the numbers from these two commands are different, you likely have dupes somewhere.
mbaehrlxer

May 26, 2014
11:00 AM EDT
with tens of thousands of photos knowing that there are dupes is not very helpful.

also, md5sum is expensive, you can compare files by size. we only need to compare files of the same size. takes a bit more scripting to work that out but may be worth the effort with lots of files. and then, if the sizes match, unless there are more than a few files of a particular size, using cmp may still be faster than md5sum. for md5sum the whole file needs to be read. cmp will abort as soon as a single byte differs. cmp handles only 2 files at once though, so to compare more than two files you need to open the same file multiple times. with more than say 4 files of the same size, i'd use md5sum too.

btw: useless use of xargs, with find. the best way to use it is:

find /path/to/photos/ -type f -exec md5sum \{\} \+
using \+ makes exec work like xargs and run md5sum with as many arguments as allowed.

greetings, eMBee.

Posting in this forum is limited to members of the group: [ForumMods, SITEADMINS, MEMBERS.]

Becoming a member of LXer is easy and free. Join Us!