在连续场景图像集中,删除相似场景,删除空白场景(如 黑场、白场),如下图:
<?php
$images = "./pic"; // 图片路径
$threshold_rgb = 80; // 0~443 色差相似阀值
$threshold_image = 0.50; // 0~1.00 图像相似度
$threshold_transform_rgb = 30; // 0~443 空白色差相似阀值
$threshold_transform_image = 0.80; // 0~1.00 空白图像相似度
$pres = array(); // 前置图像特征值
@$pdir = dir($images);
while (false !== ($file = $pdir->read())) {
        
    if ($file == '.' || $file == '..') {
        continue;
    }
        
    $filepath = $images.'/'.$file;
    if (is_dir($filepath)) {
        continue;
    }
    
    $files[] = $filepath;
}
natsort($files);
foreach ($files as $filepath) {
    
    list($width, $height) = getimagesize($filepath);
       
    $im = imagecreatefromjpeg($filepath);
    
    $hit = 0;
    $unhit = 1;
    
    $pixhit = 0;
    $pixunhit = 1;
    
    $step = intval($width / 40); // 降低取值空间,优化性能
    
    for ($i = 1; $i <= $width; $i = $i + $step) {
        for ($j = 1; $j <= $height; $j = $j + $step) {
            
            $color = imagecolorsforindex($im, imagecolorat($im, $i, $j));
            /** 图像特征提取,对比 */
            if (isset($pres[$i][$j])) {
                // 色差
                $comp = sqrt(pow(abs($pres[$i][$j][0] - $color['red']), 2)
                    + pow(abs($pres[$i][$j][1] - $color['green']), 2)
                    + pow(abs($pres[$i][$j][2] - $color['blue']), 2));
         
                if ($comp < $threshold_rgb) {
                    $hit++;
                } else {
                    $unhit++;
                }
            }
            
            /** 空白图像特征提取,对比 */
            if (rand(1, 5) == 1 && $i > $step && $j > $step) {
                // 色差
                $comp = sqrt(pow(abs($pres[$i-$step][$j-$step][0] - $color['red']), 2)
                    + pow(abs($pres[$i-$step][$j-$step][1] - $color['green']), 2)
                    + pow(abs($pres[$i-$step][$j-$step][2] - $color['blue']), 2));
                if ($comp < $threshold_transform_rgb) {
                    $pixhit++;
                } else {
                    $pixunhit++;
                }
            }
            
            $pres[$i][$j] = array($color['red'], $color['green'], $color['blue']);
        }
    }
    
    imagedestroy($im);
    if ($hit / ($hit + $unhit) > $threshold_image) {
        unlink($filepath); // 删除相邻相似图像
    } else if ($pixhit / ($pixhit + $pixunhit) > $threshold_transform_image) {
        unlink($filepath); // 删除空白图像
    }
    
    echo "\n$filepath, hit: $hit, unhit: $unhit, diff: ".$hit/($hit + $unhit);
}
@$pdir->close();
shell~$php images.php
./pic/1.jpg, hit: 0, unhit: 1, diff: 0
./pic/2.jpg, hit: 1200, unhit: 1, diff: 0.999167360533
./pic/3.jpg, hit: 0, unhit: 1201, diff: 0
./pic/4.jpg, hit: 334, unhit: 867, diff: 0.278101582015
./pic/5.jpg, hit: 687, unhit: 514, diff: 0.572023313905
./pic/6.jpg, hit: 835, unhit: 366, diff: 0.695253955037
./pic/7.jpg, hit: 27, unhit: 1174, diff: 0.022481265612
./pic/8.jpg, hit: 1152, unhit: 49, diff: 0.959200666112
./pic/9.jpg, hit: 389, unhit: 812, diff: 0.323896752706
./pic/10.jpg, hit: 496, unhit: 705, diff: 0.412989175687
... ...
环境 AMD Athlon 7750 Dual-Core, Ubuntu 9.10, PHP 5.2.10, 对 3600 张连续图像分析,耗时 30 秒左右, 看来不错 :D