Find the K Largest Files in a Linux Directory
When a disk fills up or you’re doing a cleanup pass, finding the biggest files quickly saves a lot of time. Here are the practical approaches on Linux.
Top 10 largest files in the current directory
du -ah --max-depth=1 | sort -rh | head -n 10
Breaking this down:
du -ahshows sizes for all files and directories in human-readable format--max-depth=1limits output to one level deep (not recursive)sort -rhsorts by human-readable size, largest firsthead -n 10takes the top 10
Change 10 to whatever K you need. This is fast for shallow directory structures but won’t recurse into subdirectories.
Recursive search across the entire tree
To find the largest files anywhere under a directory:
find . -type f -printf '%s %p\n' | sort -rn | head -n 20 | awk '{printf "%.1fMB\t%s\n", $1/1048576, $2}'
This approach is cleaner than piping through du because:
find -printf '%s %p\n'outputs size in bytes and path — no subprocess overheadsort -rnsorts numerically (more reliable with large files)awkformats the output in MB for readability
Output looks like:
2543.5MB ./backups/old-vm.iso
1024.2MB ./videos/recording.mp4
512.8MB ./logs/application.log
Excluding specific directories
Skip .git, node_modules, cache directories, etc.:
find . -type f \
-not -path './.git/*' \
-not -path './node_modules/*' \
-not -path './.cache/*' \
-printf '%s %p\n' | sort -rn | head -n 20 | \
awk '{printf "%.1fMB\t%s\n", $1/1048576, $2}'
You can chain as many -not -path filters as needed. This is essential when scanning application directories where cache or dependency directories can dominate the results.
Finding files larger than a specific size
Instead of top-K, sometimes you want all files over a threshold:
find . -type f -size +500M -printf '%s %p\n' | sort -rn | \
awk '{printf "%.1fMB\t%s\n", $1/1048576, $2}'
Useful size units: +1G (over 1GB), +100M, +10k. Use - for smaller than.
Whole filesystem scan
To find the largest files anywhere on the system (requires root):
sudo find / -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -n 20 | \
awk '{printf "%.2fGB\t%s\n", $1/1073741824, $2}'
The 2>/dev/null suppresses permission errors on directories you can’t read. Be patient — this can take minutes on large systems with many files. Consider restricting to specific mount points: find /home /var -type f ... instead.
Interactive exploration with ncdu
For manual cleanup work, command-line tools are tedious. ncdu gives you an interactive tree view where you can drill down and delete directly:
# Install
sudo apt install ncdu # Debian/Ubuntu
sudo dnf install ncdu # Fedora/RHEL/AlmaLinux
sudo pacman -S ncdu # Arch
# Run
ncdu /path/to/directory
Navigate with arrow keys, press d to delete, q to quit. Much faster for ad-hoc cleanup than parsing output from find or du. It also handles symlinks intelligently and shows actual disk usage vs. apparent size.
Performance considerations
- For a single directory level,
du -ah --max-depth=1is fastest - For recursive scans,
find -printfwithsort -rnbeatsdu -exec— fewer processes spawned - On very large directories (100k+ files), even
sortcan be slow. Pipe tohead -n Kas early as possible - If you’re scripting, use byte sizes in sort (
sort -rn) rather than human-readable — it’s predictable and faster - On networked or slow storage,
ducan thrash;find -printfis more responsive for initial results
