PCSalt
YouTube GitHub
Back to Kotlin
Kotlin (Updated: ) · 2 min read

Find and Delete Duplicate Files — Kotlin CLI Tool

A command-line utility in Kotlin that scans directories for duplicate files using MD5 hashing and lets you review and delete them safely.


Over the years, files pile up — downloads, backups, project copies, photos synced from multiple devices. You end up with gigabytes of duplicates scattered across folders. This Kotlin CLI tool scans directories, finds duplicates using MD5 hashing, and lets you review before deleting.

How It Works

  1. Scan — recursively walk directories and compute MD5 hash of every file
  2. Group — files with the same hash are duplicates
  3. Output — write the list to a file for you to review
  4. Delete — after reviewing, run the delete command to remove the duplicates you’ve marked

The two-step process (find → review → delete) is intentional. You always get to see what will be deleted before anything is removed.

Usage

Find Duplicates

java -jar search-and-delete-duplicates.jar --find ~/Documents ~/Downloads

This scans both directories and writes the results to a toDelete file:

D41D8CD98F00B204E9800998ECF8427E
/Users/nav/Documents/report.pdf
/Users/nav/Downloads/report.pdf
A1B2C3D4E5F6A7B8C9D0E1F2A3B4C5D6
/Users/nav/Documents/photos/IMG_001.jpg
/Users/nav/Downloads/backup/IMG_001.jpg
/Users/nav/Documents/old/IMG_001.jpg

Each group starts with the MD5 hash, followed by the file paths that share that hash.

Review

Open the toDelete file and comment out (with #) any files you want to keep:

D41D8CD98F00B204E9800998ECF8427E
# /Users/nav/Documents/report.pdf    ← keep this one
/Users/nav/Downloads/report.pdf       ← delete this duplicate
A1B2C3D4E5F6A7B8C9D0E1F2A3B4C5D6
# /Users/nav/Documents/photos/IMG_001.jpg  ← keep
/Users/nav/Downloads/backup/IMG_001.jpg     ← delete
/Users/nav/Documents/old/IMG_001.jpg        ← delete

Lines starting with # are skipped. MD5 hash lines are also skipped (they don’t match any file path).

Delete

java -jar search-and-delete-duplicates.jar --delete toDelete

Output:

Deleted: /Users/nav/Downloads/report.pdf | Size: 245 kb
Deleted: /Users/nav/Downloads/backup/IMG_001.jpg | Size: 3421 kb
Deleted: /Users/nav/Documents/old/IMG_001.jpg | Size: 3421 kb

The Code

The project has three files — clean separation of concerns.

Main.kt — Entry Point

package com.pcsalt.utility

fun main(args: Array<String>) {
  if (args.isEmpty()) {
    printUsage()
    return
  }

  when {
    args[0] == "--find" && args.size >= 2 -> {
      val directories = args.drop(1)
      println("Find command detected. Searching in directories: $directories")
      ListFiles().find(directories)
    }

    args[0] == "--delete" && args.size == 2 -> {
      val fileToDelete = args.drop(1)
      println("Delete command detected. Deleting files listed in: $fileToDelete")
      Deletion().delete(fileToDelete)
    }

    else -> {
      println("Invalid arguments.")
      printUsage()
    }
  }
}

fun printUsage() {
  println(
    """
    Usage:
      java -jar program.jar --find <directory1> <directory2> [...]
      java -jar program.jar --delete <file>

    Examples:
      java -jar program.jar --find ~/dir1 ~/dir2
      java -jar program.jar --delete toDelete.txt
    """.trimIndent()
  )
}

ListFiles.kt — Duplicate Detection

The core logic: walk directories, hash every file with MD5, group by hash, filter groups with more than one file.

package com.pcsalt.utility

import java.io.BufferedWriter
import java.io.File
import java.io.FileInputStream
import java.io.FileWriter
import java.security.MessageDigest
import java.util.Date
import java.util.Locale

data class Files(val md5: String, val filePath: String)
data class DuplicateDetails(val md5: String, val filePath: Set<String>)

class ListFiles {
  private var counter = 0
  private var totalFilesCount = 0

  fun find(dirToSearch: List<String>) {
    val mFiles = mutableListOf<Files>()
    val duplicates = mutableMapOf<String, MutableSet<String>>()
    val ignoreFolders = setOf(".git", "build", "node_modules", ".gradle", ".idea")
    val ignoreFiles = setOf(".DS_Store")

    println("\nSearching for duplicate files in directories: ${dirToSearch.joinToString(", ")}")
    println("Processing started at ${Date()}")

    for (rootPath in dirToSearch) {
      countFiles(File(rootPath), ignoreFolders, ignoreFiles)
    }

    println("Total files to scan: $totalFilesCount")

    for (rootPath in dirToSearch) {
      addFilesToList(File(rootPath), mFiles, ignoreFolders, ignoreFiles)
    }

    for (file in mFiles) {
      duplicates.computeIfAbsent(file.md5) { mutableSetOf() }.add(file.filePath)
    }

    val duplicateDetails = duplicates.filter { it.value.size > 1 }
      .map { DuplicateDetails(it.key, it.value) }

    println("\nDuplicate(s) of (${duplicateDetails.size}) file(s) found.")
    val logFileName = "./toDelete"
    writeDuplicatesToFile(logFileName, duplicateDetails)
    println("List of duplicate files is stored in $logFileName")
  }

  private fun fileToMD5(filePath: String): String {
    FileInputStream(filePath).use { inputStream ->
      val buffer = ByteArray(1024)
      val digest = MessageDigest.getInstance("MD5")
      var numRead: Int
      while (inputStream.read(buffer).also { numRead = it } != -1) {
        if (numRead > 0) digest.update(buffer, 0, numRead)
      }
      val md5Bytes = digest.digest()
      return md5Bytes.joinToString("") {
        ((it.toInt() and 0xff) + 0x100).toString(16).substring(1)
      }.uppercase(Locale.getDefault())
    }
  }

  // ... directory walking and file counting methods
}

Key design decisions:

  • MD5 hashing — fast enough for duplicate detection (not for security)
  • Ignored folders.git, build, node_modules, .gradle, .idea are skipped automatically
  • Progress reporting — shows file count and size as it scans

Deletion.kt — Safe File Removal

package com.pcsalt.utility

import java.io.BufferedReader
import java.io.File
import java.io.FileReader

class Deletion {
  fun delete(args: List<String>) {
    if (args.isEmpty()) {
      println("Provide a file to delete a list")
      return
    }

    BufferedReader(FileReader(args[0])).use { reader ->
      var line: String?
      while (reader.readLine().also { line = it } != null) {
        if (line!!.trim().startsWith("#")) continue

        val fileToDelete = File(line)
        if (fileToDelete.exists()) {
          val size = fileToDelete.length() / 1024
          if (fileToDelete.delete()) {
            println("Deleted: ${fileToDelete.absolutePath} | Size: $size kb")
          } else {
            println("Failed to delete: ${fileToDelete.absolutePath}")
          }
        }
      }
    }
  }
}

Lines starting with # are skipped — this is how you keep files from being deleted after review.

Building from Source

Prerequisites: JDK 17+

git clone https://github.com/krrishnaaaa/Search-and-Delete-duplicate-files.git
cd Search-and-Delete-duplicate-files
./gradlew shadowJar

The fat JAR is generated at build/libs/search-and-delete-duplicates-<version>.jar.

Download

Pre-built JARs are available on the GitHub Releases page.

Why Not Use Existing Tools?

Tools like fdupes and rdfind exist, but:

  • This tool gives you a review step — nothing is deleted without your approval
  • The output file format is simple and editable — just file paths, one per line
  • It’s a single JAR with no dependencies — runs anywhere Java runs
  • The code is ~200 lines of Kotlin — easy to understand and modify for your needs

Source Code

View on GitHub krrishnaaaa/Search-and-Delete-duplicate-files