Ever watch and listen to an interactive computer presentation and wonder why the words and music seem so realistic, yet the video on the screen is quite imperfect? The major reason for this is the large differences between the number of bytes necessary to store each form of multimedia information (voice, music and video). As an introduction to the new Multimedia Project articles, I'll discuss the math behind the multimedia.
Sounds cause the air to move and this movement can be recorded. This recording appears as a curved wave pattern which can be seen on an oscilloscope or with sound sampling software. Conventional cassette recorders store this sound using an analog process, while computers store this sound in a digital manner.
In order to explain this digital waveform storage, we need to draw a waveform on a piece of paper, with the horizontal (X) axis denoting time and the vertical axis (Y) denoting amplitude ("loudness"). in this example, I am considering only the "perfect case" and not considering such problem factors like signal to noise ratio, aliasing, distortion or sampling error.
If a grid was placed on top of this waveform drawing, each time that the waveform crosses a vertical line would be a sample. This sample value would be rounded to the nearest horizontal line. By increasing the number of vertical lines for a given time period, the higher the sampling frequency. In digital recordings, samples can be stored from approximately 5,000 (5K) to over 40,000 (40K) times per second. Waveform files (with the extension .WAV) have a number, generally 22K or 44K which stands for the sampling frequency used when making the recording.
Since most human ears can't hear sounds past 22,000 hertz (cycles per second), high quality sound is sampled at approximately 44,000 (44K) times per second. This doubling is due to the Nyquist theorem, which proves that a sampling rate of twice a given frequency is sufficient for the perfect reproduction of sine waves of that given frequency and any frequencies below that point. A 22K sampling rate would only be able to reproduce frequencies below 11,000 hertz.
The gap between the horizontal lines is called the sampling resolution. The more lines the better the resolution. If we used 256 horizontal lines for our grid, this waveform could be stored with 8 bit samples. Only 8 bits ( bi nary digi ts ) are needed to store any of the values from 0 to 255. If 65,536 horizontal lines were used instead, this could be stored with 16 bit samples since only 16 bits are need to store any of the values from 0 to 65,535.
When sound is digitally recorded with sixteen bits (2 bytes, since 8 bits equal one byte) of information and sampled at about 44,100 times per second the result is high quality sound equivalent to that of an audio CD. Approximately 88,200 bytes of storage would be required for each second of sound recording. A typical 3.5" high density floppy disk (1,440,000 bytes) could store a little over 16 seconds of this high quality sound.
Another way to store musical material is to use MIDI, which stands for Musical Instrument Digital Interface. MIDI was developed in the early eighties in order to allow different musical instruments to "talk" to each other. This method of recording stores musical information in a specialized "note-like" format, instead of recording the actual sounds as in a "waveform" file. The MIDI recording method cannot store speech or other "sounds".
Since the complete details about MIDI are complex and several books have been written about this topic, I'll give a brief overview. MIDI stores musical information in a message format. A typical message consists of a single status byte followed by data bytes. Items such as Note on, Note off, Note number to be played and the key pressure are part of the information that is stored as MIDI data. A MIDI synthesizer takes this digital data and "plays" it back as music. A MIDI file sounds only as good as the synthesizer it is being PLAYED BACK on. Each brand of synthesizer has a slightly different idea of what each instrument sounds like, so the playback of MIDI files can sound quite different from what you intended when you created the MIDI file. Even with these problems, the best part of MIDI is in the math.
Since items such as musical complexity and the numbers of instruments used can greatly change the number of bytes needed to store MIDI data, I will give a few concrete examples in order to make a reasonable estimate. The Star Spangled Banner played in 1 minute on a piano, is a file of 3,236 bytes. A more complex blues piece of one minute twenty four seconds is 11,934 bytes in length. A more complex rock piece of four minutes and eighteen seconds is 41,351 bytes.
The number of bytes needed to store one second of MIDI data is calculated by dividing the total number of bytes in the file by the length of the piece in seconds. This will yield the values of 53.93, 142.07 and 160.27 respectively. From this we can see that the more complex the music, the more bytes that will be required. In case the music I used was not complex enough, I will double our highest value and use 320 for our estimate. This means that our high density floppy would be able to store about 75 minutes of music. Since that's about the length of a CD, you see why storing music as MIDI data is so storage efficient!
For the video side of this "equation", let's use the common resolution of 640 horizontally by 480 vertically at 24 bits (3 bytes) of color. This "frame" of video needs to be stored 30 times per second. One second of this video would require 640x480x3x30 bytes which is a whopping 27,648,00 bytes! Only a little over one twentieth (1/20) of a second could fit on the same floppy disk which stores about 16 seconds of digitized sound or 75 minutes of complex MIDI music!
Like floppy disks, a computer processor also has a fixed amount of computing resources. A processor that can easily handle MIDI at 320 bytes per second or sound at 88 thousand bytes per second has a much more difficult time handling the whopping 27 million bytes per second! This the major reason that sound is great and video is poor. This video problem can be solved through a variety of techniques such as faster processors, secondary processors and data compression methods.
© 1993 Rick Smith All rights reserved.