Python to Duplicate Files Remover with Source Code

October 21, 2021October 21, 2021 by Admin

This script removes duplicate files in the directory where the script runs.

Prerequisites:

No external libraries are used
os
hashlib

Run the Script:

Execute

python3 duplicatefileremover.pyCode language: CSS (css)

Working:

The script first lists all the files in the directory.
It takes MD5 hash of each file, when hash of 2 files become same it deletes the file.

Source Code:

import hashlib
import os

# Returns the hash string of the given file name


def hashFile(filename):
    # For large files, if we read it all together it can lead to memory overflow, So we take a blocksize to read at a time
    BLOCKSIZE = 65536
    hasher = hashlib.md5()
    with open(filename, 'rb') as file:
        # Reads the particular blocksize from file
        buf = file.read(BLOCKSIZE)
        while(len(buf) > 0):
            hasher.update(buf)
            buf = file.read(BLOCKSIZE)
    return hasher.hexdigest()


if __name__ == "__main__":
    # Dictionary to store the hash and filename
    hashMap = {}

    # List to store deleted files
    deletedFiles = []
    filelist = [f for f in os.listdir() if os.path.isfile(f)]
    for f in filelist:
        key = hashFile(f)
        # If key already exists, it deletes the file
        if key in hashMap.keys():
            deletedFiles.append(f)
            os.remove(f)
        else:
            hashMap[key] = f
    if len(deletedFiles) != 0:
        print('Deleted Files')
        for i in deletedFiles:
            print(i)
    else:
        print('No duplicate files found')

Output:

Leave a Comment Cancel reply

Please Enable JavaScript in your Browser to Visit this Site.

#mdp-deblocker-js-disabled { position: fixed; top: 0; left: 0; height: 100%; width: 100%; z-index: 999999; text-align: center; background-color: #FFFFFF; color: #000000; font-size: 40px; display: flex; align-items: center; justify-content: center; }