>> |
Anonymous
>>560020 If you have specific feature/change requests (rather than "it sucks donkey dicks", which doesn't really tell me what issues you have with it), make them and I'll consider implementing them.
Please leave any such advice on the 4scrape comments page so we can stop cluttering up the OP's thread with my stupid shit :3
>>560022 MD5 is actually a terrible way to compare two images, simply because extremely minor changes (re-saving a JPEG, changing some EXIF data, different resolutions/aspect ratios of the same image, etc) result in a different hash value. What most image organizers do (pImgDb included, I think) is decompose the image such that minor changes (and even things like detexts) are much less noticable so two versions of the same image are flagged as similar.
The algorithm 4scrape uses is to divide the image up into 4 quadrants then find the average pixel value of each quadrant (which produces a vector of 4 pixels * 3 color channels = 12 numbers). To compare two images for similarity, each pair of corresponding dimensions are compared; if the threshold for a single dimension exceeds a threshold, it considers it not a match; if the sum of the absolute values of all the differences exceeds a different threshold, the image is again discarded.
There's all kinds of academic papers on image comparison (it's basically machine vision, easy mode) and there's all kinds of crazy ways to implement it. I chose a method which was shit, but easy to implement and relatively easy to compare large amounts of images (because it can cache the expensive operation - decomposition of large images).
But yeah. You've got a really nice set of data, OP, and there's all kinds of cool things you can do with it (with regards to image analysis). Even if you're not a technical person, you can use it to do artsy shit like image mosaics, or raid a computer lab and set each monitor with a random wallpaper or something. Use your imagination :3
|