(Last Mod: 27 November 2010 21:39:38 )
The Wave File Format (WAV extension) is Microsoft's audio file format and, as such, is extremely stable and widely used. Most commercially viable audio programs, regardless of platform or operating system, support WAV files.
The WAV file supports numerous options that are seldom used and therefore will not be covered here. The only type of WAV file that will be discussed on this page is the uncompressed type. Since virtually all WAV files are uncompressed, this is not a significant sacrifice. Several on-line resources cover the more arcane options, with perhaps Sonic Spot being the most readable. However, be aware that the Sonic Spot description has a few errors (as of May 2007), so don't accept what is said there blindly. Most of the information on this page was obtained from Sonic Spot, but it has been supplemented both with material from other sources and also with experience gained by actually working with uncompressed WAV files directly.
A WAV file is a particular type of RIFF file. We will only discuss the RIFF basics and only as far as necessary to make understanding WAV files easier. A more complete discussion can be found on Wikipedia.
RIFF stands for Resource Interchange File Format. It was developed by Microsoft and IBM and has been around since 1991 and patterned after IFF (Interchange File Format) (Wikipedia article on IFF). Since Microsoft's operating system operated on Intel processors, the format of multi-byte values was defined to be Little Endian. Little Endian merely means that the LSB (least significant byte, hence "little end") is at the base address, which is the lowest numbered address of all the bytes that make up the value. This means that, in a file, the LSB comes first and the MSB (most significant byte) comes last when reading Little Endian values.
A RIFF file is organized as a set of "chunks". A chunk is a data structure that is very simple and very flexible. Chunks consist of an eight byte header and data. That's it. Nothing else. The header consists of a four-byte ASCII string that indicates what the "chunk type" is followed by a 32-bit unsigned integer that indicates how many bytes are contained in the data that follows the header. That's it. Nothing else. Don't make the mistake of making things harder than they are.
The power of RIFF files comes from the fact that the contents of a particular chunk is defined by the chunk type and not by the person or organization (Microsoft) that invented RIFF files. Thus other people and organizations are free to develop and define other RIFF file types - and if there is sufficient acceptance of it, it may even become a registered RIFF type. Because of this extensibility, programs only have to know how to process the chunk types it needs in order to carry out its task and it can simply skip over any chunks it doesn't recognize. For instance, developers of a particular audio playback program may want to take standard WAV files and embed equalizer settings in them. They might define a chunk type called 'eqzr' for this purpose. Programs that see this chunk and don't recognize it simply ignore it while programs that do recognize it can make use of its contents. Having said that, there is no requirement that a program behave this way (in general) - it merely should. But it can be made a requirement for a specific type of RIFF file by the developers of that type.
Finally, a RIFF file contains exactly one chunk, namely a 'RIFF' chunk. You might be a bit confused at this point by the apparent contradiction between saying that a RIFF file has only a single chunk and then saying that programs can process the chunks they need and skip the rest. The key is that chunks can contain chunks. So the top level 'RIFF' chunk might be of type 'JUNK' and the definition of a 'JUNK' 'RIFF' chunk says that it consists of a 'jnk1' chunk followed by two 'jnk2' chunks followed by a two-byte unsigned integer that indicates how many 'jnk3' chunks follow that and, at the end, there is an optional 'jnk4' chunk. Just as the 'JUNK' chunk had a definition that described what it must and may contain, each of the subsidiary chunks would also have a definition that says what they must/may contain, which may include yet more chunks. It may sound complicated, but if you take it step-by-step you'll discover that it is quite simple.
| Offset | Bytes | Description | Legal Values | Comments | 
| 0x00 | 4 | Chunk ID | 4 ASCII characters | |
| 0x04 | 4 | Chunk Size | 0 - 4,294,967,295 | The size of everything that comes after the header, not including any final pad byte. | 
| 0x08 | ?? | Chunk Data | The content and format is defined by Chunk ID | 
All chunks in a RIFF file follow the same basic format shown above.
There is no requirement that the Chunk ID be read and compared character by character. After all, a bit's a bit. These four bytes can be read and compared to the expected values using any method that is convenient. For instance, they can be read all at once into a 32-bit integer and then compared to the value of a 32-bit integer having that same bit pattern. But if you read this integer the same way that you will read all of the many other 32-bit integers in the file than you must take into account that RIFF files store multi-byte values using the Little Endian convention. This is one of the points where Sonic Spot has an error (or at least does not explain what they mean very carefully) as they did not take this into account in the various values they cite throughout their page.
This value indicates the total number of bytes in the chunk data and does not include the eight bytes in the chunk header.
There is one subtle requirement about RIFF files and chunks that many people, including people that write commercial software, seem to be blissfully ignorant of. All chunks in a RIFF file must be an even number of bytes in length and, if they aren't, a final zero byte must be appended. The reason is that most computer systems are much more efficient transferring data in multi-byte parcels if those parcels start on an even-byte (also known as a word) boundary. Therefore, all chunks must start on a word boundary, meaning that if the prior chunk contains an odd number of bytes that an additional pad byte must be added. The value of this pad byte is supposed to be zero.
So, is this pad byte part of the chunk or is it a byte that happens to be sitting in between one chunk and the next? You can think of it either way, but it is perhaps most accurate to think of it as a byte sitting between two chunks. The reason for this claim is that the value of the Chunk Size of the preceding chunk does not include the pad byte in its count and, therefore, arguably the pad byte is not part of the prior chunk's data. You can detect if a pad byte exists between two chunks by checking if the prior chunk's Chunk Size is odd. But it is just as accurate to claim that the byte must be part of the prior chunk otherwise we would have stray bytes here and there that are not part of chunks which, on the surface, violates the rule that RIFF files, aside from the RIFF Type ID, contain nothing but chunks. Under this interpretation, the Chunk Size only indicates the number of bytes of meaningful data in the Chunk Data and the existence of a final pad byte is inferred if the amount meaningful data is odd. It's really a matter of semantics and both interpretations lead to exactly the same conclusion - a pad byte exists just before the next chunk (whether it is considered part of the present chunk or not) if the Chunk Size is odd.
In practice, few chunks will need to have pad bytes (but some will, so don't ignore it). Good developers try to predict all of the various ways that the people that use their product will find to use it incorrectly and hence most well designed chunks are defined so that there is always an even number of bytes in the data block. As a result, even sloppily designed software will probably work as expected - but don't rely on that and don't use it as an excuse for designing sloppy software.
Another subtle point to watch out for is that many people that write software don't seem to comprehend the difference between signed and unsigned data types. As a result, they use a signed data type to read integer values from a binary file and, if the value is more than half of the unsigned range for that type, end up with a negative number. For this reason, some programs will only correctly read values up to half of their theoretical range. Because of this, some good developers anticipate the problem and restrict their own programs to only generating files whose values can be properly read as signed integers. You may never encounter such a program, but it is good to be aware of the issue so that you might be able to figure out why something is not working as expected.
This is where the data goes and it can be in any format defined by the developer of the chunk type.
As mentioned previously, if the size is an odd number of bytes, then a pad byte (equal to zero) must be added after the end of the chunk data. That byte is not included in the Chunk Size.
| Offset | Bytes | Description | Legal Values | Comments | 
| 0x00 | 4 | Chunk ID | 'RIFF' | If read as an integer: 0x46464952 (remember, Little Endian!) | 
| 0x04 | 4 | Chunk Size | 1 - 4,294,967,295 | The size of everything that comes after the header. | 
| 0x08 | 4 | RIFF Type | 4 ASCII characters | For WAV files, this is 'WAVE' | 
| 0x0A | ? | First Subchunk | Type and order of subchunks is defined by the RIFF Type | 
The RIFF chunk is extremely simple. The data consists of a four-byte ASCII string indicating what type of RIFF file it is. This is then followed by an arbitrary number of chunks (called "RIFF subchunks"). That's it. Nothing else. A RIFF file consists of a single chunk. That's it. Nothing else. It really is that simple - don't make the mistake of making it harder than it is.
The Chunk Size includes the four characters that indicate the RIFF Type (they are, after all, part of this chunk's data) as well as all of the pad bytes contained in all of the subchunks.
The ASCII string specifying the RIFF Type for a wave RIFF file is, not surprisingly, 'WAVE'.
There are many (at least eleven) defined RIFF subchunks for a 'WAVE' RIFF chunk. See Sonic Spot for a description of the eleven referred to. However, there are relatively few actual files that use anything other than the two basic (and required) ones, namely the format ('fmt ') and the data ('data') subchunks.
Although there is no requirement for the format chunk to precede the data chunk, not doing so can create huge problems for some applications, particularly streaming audio. Furthermore, although all programs should be designed to handle the chunks in either order, many assume that the format chunk comes first and will misbehave if that is not the case. As mentioned previous, sound software design dictates that you read and write your files so that it has a good chance of working even with someone else's sloppily designed software.
| Offset | Bytes | Description | Legal Values | Comments | 
| 0x00 | 4 | Chunk ID | 'fmt ' | If read as an integer: 0x20746D66 (remember, Little Endian!) | 
| 0x04 | 4 | Chunk Size | 16 + (2 + ED) | The 2 is only added if there is actually extra format data. | 
| 0x08 | 2 | Compression Type | 1 - 65,535 | 1 = PCM (uncompressed) is by far the most common. | 
| 0x0A | 2 | Channels | 1 - 65,535 | Number of channels of data. | 
| 0x0C | 4 | Slice Rate | 1 - 4,294,967,295 | Samples per second per channel. | 
| 0x10 | 4 | Data Rate | 1 - 4,294,967,295 | Bytes per second that needs to be read from file to keep up. | 
| 0x14 | 2 | Block Alignment | 1 - 65,535 | Number of bytes in a complete slice of data. | 
| 0x16 | 2 | Sample Depth | 2 - 65,535 | Number of significant bits per sample. | 
| 0x18 | 2 | Extra Format Size | 0 - 65,535 | Number of bytes of additional data that follows. | 
| 0x1A | ED | Extra Format Data | Contents and format defined by the Compression Type | 
This is the brains of the WAV file. This chunk contains all of the information that the reading program needs in order to know how to read and playback the data. In fact, this chunk is what is known as "overdefined," meaning that it contains values that are not really needed because they can be computed using only other values in the file. In particular, the Data Rate and the Block Alignment are redundant values. In general, this is a bad practice because there is no guarantee that the values contained in the file will agree with the values obtained if computing them from the other values in the file. Depending on the program, the results can be disastrous if the values are not consistent, but since the most likely result for a WAV file is that the playback does not occur correctly, or at all, the risk is low. While it could have been sloppiness on the developers part (Microsoft sloppy? Never!), it may also have been a deliberate attempt to make developing small, brain dead, embedded WAV file readers easier. Remember, the processing power available when the format was defined was rather miniscule compared to today, so having the program that was writing the file (which needed the values anyway) go ahead and record the values in the file for the convenience of the reading program was not completely unreasonable. What might have been useful would have been for those developers to declare which values are primary and which are subsidiary, meaning the minimum set of non-overdefined parameters that should be used to check and, if necessary, replace the redundant parameters. Having said that, most programmers would have ignored this information.
Let's look at each of these values in turn.
This MUST be 'fmt '. Notice that it is all lower case and that it includes a space.
There is no requirement that the Chunk ID be read and compared character by character. A bit's a bit. So these four bytes can be read as a single four-byte integer and compared to the four byte integer 0x20746D66. This is one of the points were Sonic Spot has an error, they did not take into account that if you read the value as a four-byte integer, that it will be read using the Little Endian convention.
Like all chunks, this indicates the total number of bytes in the chunk data (not including any final pad byte).
In nearly all WAV files, this value will be 16, meaning that there is no additional format data.
If there is additional format data, then this value will be 18 plus the amount of Extra Format Data. Note that the additional two (18 vs. 16) comes from the fact that the two bytes that indicate the Extra Format Size is not considered part of the Extra Format Data (to be consistent with how chunk sizes are treated).
According to Sonic Spot, uncompressed files never have extra format data since the purpose of the extra format data is to provide information specific to the type of compression used. This may be true, but there are two ways to indicated that there is no extra format data: (1) by having a chunk size of 16, or (2) by having a chunk size of 18 and then having an Extra Format Size equal to zero. This second method is weird and possibly not even "proper" by Microsoft's definition, but at least one commercial program, WavePad, uses it so you should be tolerant of it.
This value is equal to 1 in nearly all WAV files indicating an uncompressed format. Since that is the only format of interest here, we will not even list the other codes and formats. For those interested, Sonic Spot has many (if not all) of them tabulated.
This is simply the number of channels of data contained in the file. A monaural (mono) file would have one channel, a stereo channel two and so on. Something like Dolby 5.1 would have six. The number of channels is equal to the number of microphones that were being recorded simultaneously.
This is also known as the "sampling rate" but that can be a bit ambiguous when you are recording more than one channel. For instance, if you have five channels and each one is being sampled one thousand times a second, is the sampling rate 1000 samples/sec or 5000 samples/sec. By defining a "slice" as each moment in time at which one sample is taken per channel we avoid this ambiguity. In the example just mentioned, the slice rate is 1000 samples per second. It should be obvious that the "slice rate" is simply the sampling rate for each channel.
It may seem ridiculous to allow slice rates in excess of 4 Gigasamples/second, but that is simply an artifact of using a 4-byte value instead of a 2-byte value. If 2-bytes had been used instead, the limit would have been 65,535 samples/second and CD quality audio, at 44,100 samples per second, is approaching that limit. Studio master recordings routinely exceed that limit and so the next larger integer size needed to be used.
This is the average number of bytes that must be read from the file in order to keep up with playback. It is one of the values that makes the file format overdefined in that it can be calculated from other values in the file. The data rate is simply the number of slices that must be read every second multiplied by the number of bytes required to store each slice. That latter is known as the "block alignment". Hence:
Data Rate = (Slice Rate) * (Block Alignment)
This is the number of bytes in the file per slice. It is another one of the values that makes the file format overdefined in that it can be calculated from other values in the file. This is also the place where Sonic Spot has made the most glaring error - the formula they give for computing this parameter is simply wrong.
The rules for determining the block alignment follow directly from the rules for storing data of various depths. Each sound sample is required to occupy an integer number of 8-bit bytes. A slice is composed of one sample per channel and is required to be an integral number of bytes - but since each sample is already an integral number of bytes this requirement is satisfied automatically.
Block Alignment = (Channels) * (BytesPerSample)
The number of bytes per sample can be determined by first dividing the number of significant bits per sample by 8. This gives the number of complete bytes of data. If there is any remainder from this operation, then an additional byte is needed to store the extra bits. So, for instance, if a sample only uses 4 bits each sample would occupy one byte. Similarly, 10-bit samples require two bytes per sample.
Most programming languages support the notion of integer and modulo division. In the C programming language, the formula for Block Alignment can be written as:
BytesPerSample = (SignificantBitsPerSample / 8) + (0 != (SignificantBitsPerSample % 8));
yielding
BlockAlignment = Channels * ( (SignificantBitsPerSample / 8) + (0 != (SignificantBitsPerSample % 8)) );
For the above to work properly, BytesPerSample must be of integer, as opposed to floating point, data type.
Another way to compute it in C is:
BlockAlignment = Channels * ceil(SignificantBitsPerSample / 8);
The sample depth is also known as the number of significant bits per sample. It is usually a multiple of two between 8 and 16 as these are the most common ADC (analog to digital converter) and DAC (digital to analog converter) resolutions. But other depths can and due exist.
The sample depth determines how many bytes are required to store each sample. In short, the sample is stored in the fewest number of whole bytes that can accommodate all of the significant bits. In the C programming language, two possible expressions to calculate this are:
BytesPerSample = (SignificantBitsPerSample / 8) + (0 != (SignificantBitsPerSample % 8));
or
BytesPerSample = ceil(SignificantBitsPerSample / 8);
The sample depth not only determines how many bytes are needed per sample, but the format in which that sample will be stored. That will be discussed in the Data Chunk section.
This value, if it exists, tells how many extra bytes of format data follow it. If the data is followed by a pad byte, that byte is not counted in this total.
Extra format data is only needed for compressed file types, which are not of interest here. However, some programs automatically append this section even if the data is empty, meaning that this value will be zero and that the Chunk Size value above will be 18 instead of 16.
The section contains additional data that might be needed for a particular compression type. The length and format of the data is defined by the particular compression type in use. Since very few WAV files use compression, it is unlikely that extra format data will ever be encountered.
| Offset | Bytes | Description | Legal Values | Comments | 
| 0x00 | 4 | Chunk ID | 'data' | If read as an integer: 0x61746164 (remember, Little Endian!) | 
| 0x04 | 4 | Chunk Size | 0 - 4,294,967,295 | Does not include any pad byte that follows the data. | 
| 0x08 | AD | Audio Data | Format is dictated by the number of significant bits per sample | 
If the format chunk is the brains of a WAV file, then the data chunk is its heart, soul, and guts.
This MUST be 'data'.
There is no requirement that the Chunk ID be read and compared character by character. A bit's a bit. So these four bytes can be read as a single four-byte integer and compared to the four byte integer 0x61746164. This is one of the points were Sonic Spot has an error, they did not take into account that if you read the value as a four-byte integer, that it will be read using the Little Endian convention.
Like all chunks, this indicates the total number of bytes in the chunk data (not including any final pad byte).
All audio data is stored as integer values. If the number of significant bits is eight or less (in other words, if each sample requires only a single byte of storage) then the values are stored as unsigned integers. If more there are more than eight significant bits (or, equivalently, more than one byte per sample) then they are stored as signed integers.
Note that while it might appear that a huge amount of data can be stored in a WAV file, the limit is actually quite mundane. A file consisting of CD quality Dolby 5.1 data would be limited to 135 minutes, or about the length of a feature film's soundtrack.
The program linked below allows the user to generate WAV files that produce a sinusoidal that ramps linearly from an initial frequency and amplitude to a center point frequency and amplitude and then linearly from there to a final frequency and amplitude. It is a very simple program and generates a monotone file at a fixed 44100 Sa/sec.
"Wave File Format", Sonic Spot, http://www.sonicspot.com/guide/wavefiles.html, last accessed on 27 May 2007.
"WAV", Wikipedia, http://en.wikipedia.org/wiki/WAV, last accessed on 27 May 2007.
"Interchange File Format", Wikipedia, http://en.wikipedia.org/wiki/Interchange_File_Format; last accessed on 27 May 2007.
"EA IFF 85 Standard for Interchange Format Files", Jerry Morrison (Electronic Arts), http://www.szonye.com/bradd/iff.html, last accessed 27 May 2007.