Fingerprinting, also known as Hashing, is the process of transforming any given string of characters, such as the ones and zeros of a file, into another value. This other value is usually represented by a shorter, fixed-length sequence of characters that can look like this:
What’s important here is that the Fingerprint of a file will remain the same as long as the file remains the same. This means that Fingerprinting the same file repeatedly will always result in the same Fingerprint. This coherence of Fingerprints allows for detecting duplicates on shared storage. If two files have the exact same sequence of ones and zeros, they also have the same Fingerprint, meaning they are duplicates of one another. When a file gets copied, the copy will have the same Fingerprint as the source file, regardless of whether the copy got renamed by a user.
To be able to deduplicate files on shared storage, Strawberry first needs to create at least one fingerprint for each file that it manages. Fingerprinting is not limited to media files but includes all file types within the Strawberry-managed structure on the shared storage.
In More Detail
Strawberry’s fingerprints are generated using SHA-512, a hashing algorithm used in cryptography. SHA-512 generates a fingerprint that consists of 128 hexadecimal characters. The goal of using a complex hashing algorithm is to generate a unique “Essence Fingerprint” per Strawberry-managed file. These fingerprints can then be used to reliably identify file duplicates.
In addition to the “Essence Fingerprint”, Strawberry generates a “Metadata Fingerprint”. Together, these two fingerprints serve as unique identifiers for a file.
The essence fingerprint is generated based on the physical bits of a file. As every file on a computer is, ultimately, just data that can be represented in binary form, SHA-512 can take that data and run a complex calculation on it and output a fixed-length string as the result of the calculation. The result is the file’s fingerprint.
To add an additional level of file uniqueness, Strawberry can also generate “Metadata Fingerprints”. For video files, the metadata fingerprint is generated using the format, codec, aspect ratio, width, height, duration, colourspace, and more for the input video. For audio, it includes the audio codec, sample rate, number of audio tracks & channels, and if available any ID3 tags such as the title, album, artist, and genre.
If two files have the same metadata & essence fingerprints, then this means that they are identical, or in other words… one of the files is an exact copy of the other.
If you would like to learn more about SHA-512 and hashing in general, we recommend having a look at this article: https://medium.com/@zaid960928/cryptography-explaining-sha-512-ad896365a0c1
Need more help with this?
Visit the Projective Support Websites