在连续场景图像集中,删除相似场景,删除空白场景(如 黑场、白场),如下图:
<?php
$images = "./pic"; // 图片路径
$threshold_rgb = 80; // 0~443 色差相似阀值
$threshold_image = 0.50; // 0~1.00 图像相似度
$threshold_transform_rgb = 30; // 0~443 空白色差相似阀值
$threshold_transform_image = 0.80; // 0~1.00 空白图像相似度
$pres = array(); // 前置图像特征值
@$pdir = dir($images);
while (false !== ($file = $pdir->read())) {
if ($file == '.' || $file == '..') {
continue;
}
$filepath = $images.'/'.$file;
if (is_dir($filepath)) {
continue;
}
$files[] = $filepath;
}
natsort($files);
foreach ($files as $filepath) {
list($width, $height) = getimagesize($filepath);
$im = imagecreatefromjpeg($filepath);
$hit = 0;
$unhit = 1;
$pixhit = 0;
$pixunhit = 1;
$step = intval($width / 40); // 降低取值空间,优化性能
for ($i = 1; $i <= $width; $i = $i + $step) {
for ($j = 1; $j <= $height; $j = $j + $step) {
$color = imagecolorsforindex($im, imagecolorat($im, $i, $j));
/** 图像特征提取,对比 */
if (isset($pres[$i][$j])) {
// 色差
$comp = sqrt(pow(abs($pres[$i][$j][0] - $color['red']), 2)
+ pow(abs($pres[$i][$j][1] - $color['green']), 2)
+ pow(abs($pres[$i][$j][2] - $color['blue']), 2));
if ($comp < $threshold_rgb) {
$hit++;
} else {
$unhit++;
}
}
/** 空白图像特征提取,对比 */
if (rand(1, 5) == 1 && $i > $step && $j > $step) {
// 色差
$comp = sqrt(pow(abs($pres[$i-$step][$j-$step][0] - $color['red']), 2)
+ pow(abs($pres[$i-$step][$j-$step][1] - $color['green']), 2)
+ pow(abs($pres[$i-$step][$j-$step][2] - $color['blue']), 2));
if ($comp < $threshold_transform_rgb) {
$pixhit++;
} else {
$pixunhit++;
}
}
$pres[$i][$j] = array($color['red'], $color['green'], $color['blue']);
}
}
imagedestroy($im);
if ($hit / ($hit + $unhit) > $threshold_image) {
unlink($filepath); // 删除相邻相似图像
} else if ($pixhit / ($pixhit + $pixunhit) > $threshold_transform_image) {
unlink($filepath); // 删除空白图像
}
echo "\n$filepath, hit: $hit, unhit: $unhit, diff: ".$hit/($hit + $unhit);
}
@$pdir->close();
shell~$php images.php
./pic/1.jpg, hit: 0, unhit: 1, diff: 0
./pic/2.jpg, hit: 1200, unhit: 1, diff: 0.999167360533
./pic/3.jpg, hit: 0, unhit: 1201, diff: 0
./pic/4.jpg, hit: 334, unhit: 867, diff: 0.278101582015
./pic/5.jpg, hit: 687, unhit: 514, diff: 0.572023313905
./pic/6.jpg, hit: 835, unhit: 366, diff: 0.695253955037
./pic/7.jpg, hit: 27, unhit: 1174, diff: 0.022481265612
./pic/8.jpg, hit: 1152, unhit: 49, diff: 0.959200666112
./pic/9.jpg, hit: 389, unhit: 812, diff: 0.323896752706
./pic/10.jpg, hit: 496, unhit: 705, diff: 0.412989175687
... ...
环境 AMD Athlon 7750 Dual-Core, Ubuntu 9.10, PHP 5.2.10, 对 3600 张连续图像分析,耗时 30 秒左右, 看来不错 :D