Digital fingerprinting analyses the unique features of an audio or video asset and compares them against ‘reference’ fingerprints stored in a database. One of the key characteristics of fingerprinting is that it does not modify the content. Similar to a human fingerprint that uniquely identifies a human being, a digital fingerprint uniquely identifies a piece of video/audio content. The analogy can be extended to the process of fingerprint matching: first, known fingerprints (‘reference’ fingerprints) must be stored in a database; then, a ‘candidate’s’ fingerprint is queried against the fingerprint database for a match. Sometimes fingerprinting technology is referred to as robust video hashing. Conventional cryptographic hashing (e.g. MD5) is fragile; an error in a single bit is sufficient for the hash to completely change. These fragile hashing technologies are not considered to be content-based identification technologies since they do not consider the content, understood as information, just the bits.
What is a fingerprint?
A fingerprint is a set of features that uniquely identify a segment of audio or video content. The creation of a video or audio fingerprint involves the use of specialized software that decodes the video/audio data and then applies several feature extraction algorithms. Digital fingerprints are highly compressed when compared to the original source file (one fingerprint can be as small as 1 KB) and can therefore be easily stored in databases for later comparison. Fingerprints cannot be used to reconstruct the original video or audio content.
An ideal fingerprinting system should fulfill several requirements:
- Accurately identify a media asset, regardless of the level of compression, distortion or interference in a transmission channel.
- Identify the title from excerpts as short as just a few seconds (a property known as granularity); this requires support for shifting the lack of synchronization between the extracted fingerprint and those stored in the database.
- Deal with other sources of degradation, such as:
Audio: pitching (playing audio faster or slower), equalization, background noise, D/A-A/D conversion, speech and audio coders (such as GSM or MP3).
Video: heavy compression (e.g. ‘YouTube’ quality and much less), insertion or removal of subtitles or logos, scaling, aspect ratio change, speed change, 16:9 to 4:3, camcorder, black bars, conversion to black and white, flipping etc.
A fingerprinting system should be computationally efficient and fully scalable. All this is related to the size of the fingerprints, the complexity of the search algorithm and the complexity of the fingerprint extraction.