The Basics

Fingerprinting, also known as Hashing, is the process of transforming any given string of characters, such as the ones and zeros of a file, into another value. This other value is usually represented by a shorter, fixed-length sequence of characters that can look like this:

2fef5741e9111e5b4f2411eabeb8620fec9134cb118b6ca6e22cd3f2045d0ff4b03809ad54c3a7877699cd3f3e1c4814990754c69ea030356633197ee5a1baa5

What’s important here is that the Fingerprint of a file will remain the same as long as the file remains the same. This means that Fingerprinting the same file over and over again will always result in the exact same Fingerprint. This coherence of Fingerprints is great because it allows for detecting duplicates on shared storage. If two files have the exact same sequence of ones and zeros then they also have the same Fingerprint, which means they are duplicates of one another. When a file gets copied, the copy will have the same Fingerprint as the source file, regardless of whether the copy got renamed by a user.

To be able to deduplicate files on shared storage, Strawberry first needs to create at least one fingerprint for each file that it manages. Fingerprinting is not limited to media files but includes all file formats within the Strawberry-managed structure on the shared storage.

In More Detail

Strawberry’s fingerprints are generated using SHA-512, a hashing algorithm used in cryptography. SHA-512 generates a fingerprint that consists of 128 hexadecimal characters. The goal of using a complex hashing algorithm is to generate a unique “Essence Fingerprint” per Strawberry-managed file. These fingerprints can then be used to reliably identify file duplicates.

In addition to the “Essence Fingerprint”, Strawberry generates a “Metadata Fingerprint”. Together, these two fingerprints serve as unique identifiers for a file.

Essence Fingerprint

The essence fingerprint is generated based on the physical bits of a file. As every file on a computer is, ultimately, just data that can be represented in binary form, SHA-512 can take that data and run a complex calculation on it and output a fixed-length string as the result of the calculation. The result is the file’s fingerprint.

Metadata Fingerprint

In order to add an additional level of file uniqueness, Strawberry can also generate “Metadata Fingerprints”. For video files, the metadata fingerprint is generated using the format, codec, aspect ratio, width, height, duration, colorspace, and more for the input video. For audio, it includes the audio codec, sample rate, number of audio tracks & channels, and if available any ID3 tags such as the title, album, artist, genre.

If two files have the same metadata & essence fingerprints, then this means that they are identical, or in other words… one of the files is an exact copy of the other one.

If you would like to learn more about SHA-512 and hashing in general, we recommend having a look at this article: https://medium.com/@zaid960928/cryptography-explaining-sha-512-ad896365a0c1

Last modified: Jun 24, 2022

Need more help with this?
Visit the Projective Support Websites

Thanks for your feedback.