Understanding the deduplication process¶
The process of removing redundant files and replacing them by NTFS hardlinks with TreeSize is called deduplication. This will reduce the disk space that is blocked by your duplicate files.
Instead of having each of the files take up individual space on your hard disk, TreeSize removes all duplicate files and keeps only one of them.
The files that were removed will be replaced by hardlinks, which will then point to the remaining data (See: Notes on NTFS).
The data is now shared by all the hardlinks for this file, as shown in the image below.
These hardlinks can be used like any normal file. You will not notice any difference, except that the data is now shared between the other links. In fact, they are not different to any normal file, except that they do not occupy their own space.
Understanding hard links¶
A hard link is an additional name for an existing file. Every file you see in Windows Explorer already has one name — that name is itself a hard link. When you create a second hard link, you are giving the same file a second name, which can even live in a different folder. Behind the scenes, NTFS keeps a central index of all files called the Master File Table (MFT). Think of the MFT as a phone book: each hard link is like a different listing that dials the same number.
Hard links are not copies¶
A hard link does not create a copy of the file. There is still only one set of data stored on disk. All hard links that point to the same file share everything: the file’s contents, its timestamps, its attributes, and its access permissions. If you open the file through any of its hard links and make changes, you are editing the same data — every other hard link will reflect those changes immediately.
How deletion works¶
NTFS keeps track of how many hard links point to each file. Deleting a hard link removes only that particular name. It does not erase the underlying data. The actual file data is freed only when the very last hard link is deleted and no name remains.
Limitations¶
Note
Hard links only work within the same drive or partition. You cannot create a hard link that spans two different volumes.
Hard links can only point to files, not to folders.
A single file can have at most 1023 hardlinks.
All hard links to the same file share the same Security Descriptor (access permissions). Changing permissions on one hard link changes them for all.
To create a hard link, the user must have write permissions for file attributes on the respective folder branch and on the share, if the drive is not a local drive.
Hard links vs. symbolic links vs. shortcuts¶
Hard links are often confused with symbolic links (symlinks) and Windows shortcuts. Here is how they differ:
A hard link is a direct reference to the file’s data. It is indistinguishable from the “original” file name — both are equal entries pointing to the same data. Hard links survive if the original name is renamed or moved (within the same volume), because they do not depend on a file path.
A symbolic link (symlink) is a special file that contains a path pointing to another file or folder. If the target is moved, renamed, or deleted, the symlink becomes broken (“dangling”). Unlike hard links, symlinks can point across different drives and can also point to folders. Symlinks are resolved transparently by the operating system, so most applications treat them like normal files or folders.
A Windows shortcut (.lnk file) is an ordinary file that the Windows Shell interprets as a pointer to a target. Shortcuts are not resolved at the file-system level — they only work inside Explorer and applications that understand the .lnk format. A shortcut always has its own file size (typically a few hundred bytes) and its own security descriptor, independent of the target.
Which of the duplicate files will be replaced?¶
If you checkmark all files of a duplicates group, TreeSize will pick the file with the newest “Last modified” date and use it as “master” for this group. All other files will be removed and replaced by hardlinks, which point towards the master file.
If you want to manually select a master file, you can leave one of the files in a duplicates group unchecked. This file will then not be replaced, but used as master instead.
Note
Unfortunately, Windows Explorer does not show the size difference for a deduplicated file, or the folder that it is located in. Read our knowledge base for more information.
Note
You cannot use hardlinks to replace files located on different hard drives.
Note
All hardlinks pointing to the same file share the same “Security Description” (access permissions). Deduplication will apply a unified set of permissions to the one physical remaining file.