My first solution was to transform the existing "traditional" DCT hashes into sort-friendly hash. As I don't think it to be very productive to first construct the DCT hashes in traditional way and then running a re-index. I present in this blog post a class to calculate the sort-friendly hash directly.
The class is as follows:
// This class constructs perceptual hash for an image
// It is based on the basic DCT hash but by reorganizing the hash bits
// the hash is more or less sortable, so building basically a perceptual index
// DCT code initial author: Elliot Shepherd
// Sort-friendly transformation author: The Nekkid PHP Programmer
// nekkidphpprogrammer<at>
class Phash_Bis {
private $size=32;
private $small_size=8;
private $c=array();
private $reorder=array();
private function InitCoefficients() {
for ($i=1; $i < $this->size ; $i++)
// here we intialize the matrix for placing most significant frequencies to
// the beginning of the hash
for ($l=0;$l<$this->small_size;$l++) {
for ($u=0;$u<=$l;$u++) {
for ($v=0;$v<$l;$v++) {
private function blue($img,$x,$y) {
return imagecolorat($img,$x, $y) & 0xff;
public function __construct($my_size=32,$my_small_size=8) {
private function ApplyDCT($f) {
for ($u=0;$u<$n;$u++) {
for ($v=0;$v<$n;$v++) {
for ($i=0;$i<$n;$i++) {
for ($j=0;$j<$n;$j++) {
return $F;
public function hash($image) {
if (file_exists($image)) {
$img = imagecreatetruecolor($size, $size);
imagecopyresampled($img, $res, 0, 0, 0, 0, $size, $size, imagesx($res), imagesy($res));
imagecopymergegray($img, $res, 0, 0, 0, 0, $size, $size, 50);
for ($x=0;$x<$size;$x++) {
for ($y=0;$y<$size;$y++) {
for ($x=0;$x<$this->small_size;$x++) {
for ($y=0;$y<$this->small_size;$y++) {
$total += $dct_vals[$x][$y];
// Transformed hash generation
foreach ($this->reorder as $ptr) {
$hash = gmp_mul($hash, 2);
if ($dct_vals[$ptr[0]][$ptr[1]]>$avg)
$hash=gmp_add($hash, 1); ;
// Hash is returned by hexadecimal string, my preference
return substr("0000000000000000".gmp_strval($hash,$hash_len),-$hash_len);
return $hash;
I hope this helps you to find those pesky duplicates more effectively :)
If you encounter an error or any other problem, you may comment here and I'll see if I can be of help.
Very cool work - would really like to see some sample output!
ReplyDeleteWhat do you mean by seeing a sample output. The output is just 64-bit hexadecimal strings ;)
Is there a way to make this work without using GMP (gmp_mul, etc.)? I'm having trouble loading that library on my machine.