<?sphp $this->text('pagetitle') ?>
 
Home of the Squeezebox™ & Transporter® network music players.

Technical guide to FLAC

From SqueezeboxWiki

Revision as of 11:26, 5 July 2010 by Soulkeeper (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page is not designed to be too technical, just to give normal computer-using FLAC listening/encoding people an idea of why MD5's are used, why they mean something and could ultimately be later used in ways to benefit the integrity of their music collection.

If you spot typos, don't understand certain bits or have any comments feel free to update this or ask on the forums - but remember my aim was to keep things as short & easy to understand as possible and there is better technical information which I have linked to if you are inclined to read it.

Contents

Fingerprint=MD5

Firstly FLAC Fingerprints and FLAC MD5s are THE SAME THING. Fingerprint is just a word used by FLAC's author because people can more easily appreciate what the word "fingerprint" means rather than another technical term (MD5). Still, as you are reading this to get a better technical understanding I'm gonna call them MD5s!

A MD5 is a mathematical routine from a family called hash routines. Other famous hashes are CRCs (you should know these from EAC), SHA-1 and Dutch Super-Skunk :D

It is necessary to understand how/why MD5s are typically used in computing so that you can appreciate how they work in FLAC, and why generating a MD5 from an entire WAV file or the entire FLAC file itself will give a different result to the 'fingerprint' that FLAC's front-end gave you.

MD5 Basics

Not that's it's important for your understanding of them but a MD5 is always 32 letters/numbers long - of course geeks don't like to speak in such normal ways so they get referred to as being 16 bytes or 128 bits long and used with words like hexadecimal, ASCII and all kinds of things - don't worry about that. This is what a MD5 looks like: d33464171302f94053b39976620ef876

An MD5 by itself doesn't mean anything or guarantee anything which is why a MD5 is always related to some data somewhere. You can always generate a MD5 knowing only the data, but can never get any idea of what data made a MD5. The word "Slim" generates a 128-bit MD5 and so does all of this post - it is not some code-encryption & decryption game - it is ONLY ONE WAY and is a complicated mathematical formula but never random and will produce the same result given the same data. A MD5 will tell you that some data is different from what you were supposed to have just by comparing the MD5 you generate from the data, to the MD5 you were told it would make.

To quote the MD5 author: "The MD5 takes as input a message of arbitrary length and produces as output a 128-bit 'fingerprint' or 'message digest' of the input."

Spend hours of family fun generating your own MD5 hash codes here: http://www.phpbbsupport.co.uk/md5.php or if you want to get really technical go here: http://www.ietf.org/rfc/rfc1321.txt And don't worry too much about the scary things written here: http://en.wikipedia.org/wiki/MD5

A WAV file

A wav file generated from CD with EAC looks like the following (open it in Notepad or ideally a hex editor).

  • Header Block: A little bit of boring looking stuff that is always the same from a CD rip - holds description of the audio such as frequency (44,100hz), 2 channel (stereo) etc..
  • Uncompressed Data Block: A lot of boring looking stuff that if you're lucky sounds good.

People using MD5/checksum generating tools on a whole WAV file are generating a MD5 from the whole file (header and the data). FLAC's MD5 is only based on the data part of the file which is why your results are always different - FLAC is interested in the WAV files header when it encodes it, but this is stored elsewhere in the FLAC file's header and is therefore irrelevant to the fingerprint.

If we generate the MD5 hash from just the Data Block of the WAV file we might get: d33464171302f94053b39976620ef876

A FLAC file

We then encode the WAV above into a FLAC file which looks like the following:

  • A 4 character header so we know what it is: "fLaC"
  • Header Block: Header Stuff similar to the WAV file's header, just as boring (from this we can generate the WAV's Header Block).
  • Fingerprint: d33464171302f94053b39976620ef876 (should match the MD5 we'd get from performing a hash on the WAV's Data Block).
  • Vorbis Comments (tags): The only thing you'll be able to read easily
  • Compressed Data Block: When FLAC does it's magic this will be magically uncompressed into a WAV data block. FLAC will really get annoyed if it does this and the MD5 it generates from it doesn't match the fingerprint.

So when you decompress a FLAC into a WAV it then generates the WAV file then makes a MD5 of the WAV file's data block, if it doesn't match the pre-stored MD5 (Fingerprint) that was put there at compressing time then it will error.

When you run a verify option it is doing the same as above except it decompresses to a temp file and then deletes it.

When you ask FLAC for the fingerprint of its file (e.g. metaflac –show-md5-sum Track01.flac) it just shows you the fingerprint stored in the file when it was compressed.

For more technical info on the internals of a WAV file look here: http://www.borg.com/~jglatt/tech/wave.htm And likewise, more tech info on the internals of a FLAC file: http://flac.sourceforge.net/format.html#def_STREAMINFO

You can use a FLAC File Stripper if you need to submit to Logitech reduced file size, tags-only versions of your FLAC files.


/Contributors: Jim, .../