>> |
Tiver
I've been working on such a program to do image duplicate checking as i've been annoyed with what i've used out there. most of the ones i try crap out on me.
The method i'm using is resizing the image to some WxH size, adjustable by user but has to be same for a set of images to be compared. Then it equalizes the colors so it better matches images where one is brighter or darker than another but they show the same thing.
It builds a database of files and these resized images, i've been using 16x16 for my testing which results in 768 bytes data plus a bit more for filename and last modified date. Then the comparison operation works by calculating the avg distance between the pixels in the 2 16x16 normalized images. basically acting as if each RGB pixel is a 3D point. Then it determines the % difference as the avg distance over the max distance.
In my tests this works rather well, i have been tweaking it to help the results, but i've got the app running in building and displaying a list of duplicates. going to be another couple days of programming to make the interface nicer for actually selecting duplicates to delete. I also plan on letting you specify that some are not duplicates.
All of this data will be stored in a database as it processes so if it crashes or you kill the app, it can pick up where it left off. This also means scanning for duplicates will only have to create the "fingerprints" for each image once, which is by far the most expensive operation.
On my set of 20,000 images it took it 5 hours to build the fingerprints, and 1 hour to build the list of duplicate groups (which was about 1000 files, i hadn't scanned in quite a while).
|