Ruby script: find duplicate files
A quick google for a script that would find duplicate files by name in a directory tree turned up two promising techniques, one a Ruby script posted to OnJava by Bill Siggelkow and the other a bash script using common Unix tools.
Here’s my attempt to reproduce the bash results in Ruby:
-
#!/usr/bin/env ruby
-
require ‘find’
-
-
files = {}
-
found = {}
-
-
# read root directory from command line
-
ARGV.each do |arg|
-
Find.find(arg) do |f|
-
if File.file?(f)
-
# accumulate the file names
-
files[f] = File.basename(f)
-
end
-
end
-
end
-
-
# count up the number of each file name
-
files.each_value do |base|
-
# Ruby doesn’t allow this Perl idiom: found[base]++
-
found[base] = 0 if !found[base]
-
found[base] += 1
-
end
-
-
# print the path of each file found more than once,
-
# prepended with rm command commented out
-
found.each do |name,count|
-
if count > 1
-
files.each do |path,filename|
-
if name == filename
-
puts "# rm #{path}"
-
end
-
end
-
end
-
end
Given a directory structure containing files with duplicate names in different directories, the output looks something like this:
# rm /market/fruits/tomato.txt
# rm /market/vegetables/tomato.txt
# rm /market/fruits/pea.txt
# rm /market/vegetables/pea.txt
The output could be piped to a shell script, in which you’d uncomment the “rm” statements for the files that should be deleted (if that’s what you want).
This is all a bit clunky, if you’ve found a better or more Rubyesque way to do this, let me know!
December 22nd, 2007 at 7:41 am
[...] Philip Steiner escreveu um código em Ruby para buscar arquivos que tenham nome [...]
June 19th, 2008 at 3:55 pm
I wonder how to add an ignore/exclude list with such things as: .svn, Makefile, build.xml, etc.
By the way, it seems to run quite well.