Linux / UNIX Tech Support Forum
This is a discussion on Remove Duplicate Files From 2 Partitions within the Shell scripting forums, part of the Development/Scripting category; Hello Nixcraft, I have 2 partitions that contain thousands of files in a folder structure as follows: Code: /data1/wcnn/*.mp3 /data1/wxxr/*.mp3 ...
|
|||||||
| Shell scripting You can discuss the shell scripting, request shell scripts and scripting techniques |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
| Sponsored Links | ||
|
|
|
||||
|
Run find command on both partition in a background and create a text file. Once done use those text files to create diff or uniq view.
Code:
find /path/to/partition1 -iname "*.mp3" -print0 >output1.txt 2>error1.txt & find /path/to/partition2 -iname "*.mp3" -print0 >output2.txt 2>error2.txt & Code:
diff output1.txt output2.txt > diff.txt
__________________
Vivek Gite Linux Evangelist |
|
||||
|
Thanks for you replies. I executed the find commands that Vivek posted and I generated two files.
Code:
data1.txt = 14MB data2.txt = 4.5MB Code:
[root@podcast2 ~]# diff data1.txt data2.txt > duplicate_mp3.txt Code:
[root@podcast2 ~]# cat duplicate_mp3.txt Binary files data1.txt and data2.txt differ [root@podcast2 ~]# Thanks for your support, Jaysunn |
|
||||
|
Jay,
I think you need to write a bit of shell script. I had something link this for finding out duplicate mp3s but my collection is 2-3 gb max. Code:
#!/bin/bash
F1=sda1.txt
F2=sdb2.txt
# grep dupes from /dev/sda1
while IFS= read -r line
do
cf=$(basename $line)
grep -q "$cf" ${F1} && echo $line
done < "$F2"
Code:
find /mnt/sda1 -iname "*.mp3" >/tmp/sda1.txt & find /mnt/sdb2 -iname "*.mp3" >/tmp/sdb2.txt & /path/to/script > dups.txt I suggest you run this on small data set like 20 or 30 mp3 in /tmp/d1 and /tmp/d2 directory (copy them manually). Remove or add few duplicates in d2 and test the script. Once evething is working, try it on your actual data set. You may also need to consider md5 and not just filenames. Are those exact duplicates? For example. /tmp/foo and /data/foo got same name but they might have different content. In that case you need to run md5 checksup on both files. Let me know...
__________________
Vivek Gite Linux Evangelist Last edited by nixcraft; 25-09-2009 at 08:45 PM. |
|
||||
|
I think md5sum and cmp are the two commands you should look into it. Take a look at the following
Dr. Dobb's | Finding Duplicate Files | December 1, 2003
__________________
Vivek Gite Linux Evangelist |
|
||||
|
This did the trick. I have generated and wonderful list of duplicates. Thank you so so much.
You are a genius. Jaysunn |
|
||||
|
Hey using your concept I came up with this. And I created another great list. Thanks again.
PHP Code:
Jaysunn |
![]() |
| Tags |
| diff , grep , if command , read , shell script compiler , shell while command |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) |
|
| Thread Tools | |
| Display Modes | |
|
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Linux / UNIX Remove All Files in Folder Ending With ~ Symbol | demuytree | Shell scripting | 4 | 17-08-2008 07:25 AM |
| Find Duplicate IP Address / Subnet with arping | dougp23 | Networking, Firewalls and Security | 2 | 03-08-2008 07:20 PM |
| Comparing filename-substrings and remove unnecessary files | cypher82 | Shell scripting | 1 | 28-05-2008 12:53 PM |
| Grep and remove files | Linux software | 1 | 05-01-2006 06:25 PM | |
| Script to remove executable files | sweta | Shell scripting | 4 | 12-03-2005 01:21 PM |