Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations SDETERS on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Audio File Question 3

Status
Not open for further replies.

MikeMM

Automotive
Feb 1, 2005
17
I just started writing a program that will graph different features of sound waves. At this point I've only writen code to graph the raw data from a uncompressed 16 bit wav file. The graph looks perfect and exactly how I would expect a sound wave to look but for some reason the zero point of the waves is above zero.

I had sort of assumed that when a computer records a 16 bit wav file that zero would represent no pressure change, positive numbers would represent when the pressure incresed above the normal pressure on the cresting side of the sound wave and negative numbers would represent when the pressure decreases below normal pressure. So I sort of expected the total of all the numbers (except for the header) in a wav file to be roughly zero.

So my question is, did I make an incorrect assumption or did I possibly program something wrong?

thank you for your help.
 
Replies continue below

Recommended for you

Hi Mike-

Say, wouldn't it be pretty easy to record a small WAV file
without an input, then run it through your program and
see what pops out on the graph?

At first blush, and I could be absolutely dead wrong, I see no reason for having an "AC coupled" source not have a representation of zero or 1/2 full scale. However, and here's another area where I could be dead wrong, the modulation scheme used might be something like a "delta" modulation, where indeed, one would expect a "positive" differential of the waveform would be represented with an increasing or "positive" representation while a "negative" differential of the waveform would be represented with a negative representation.

Since it's in a "digital" form, there are all kinds of modulation schemes, and data compression techniques that might make the data representation unclear. O.K a tad looksee via google says that .wav files are PCM modulation.
That means Pulse Code Modulation.

Here's a pretty good little link that I found during the google search that might help explain it. Search for the data string "pcm" and you should run across it in the section titled:
"Sample Points and Sample Frames"

The link is:


Hope that this helps.

Cheers,

Rich S.
 
Maybe you picked up on the effects of a biased microphone.
 
Hi Mike,

I have been dabbling at writing some programs to analyze WAV files too, so it's gratifying to see a post from someone with the same interest. Here is a quote from a site I used as a reference:
8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767

The site URL is
So it would appear your assumption is correct, in 16 bit WAV files the samples are positive and negative. You don't mention if the files you are graphing are mono or stereo. Could this be a factor? I can't speak authoritatively about different sounds cards, but my impression is that most use 10 or 12 bit A/D converters so the samples are not really 16 bits. The sound card driver would determine whether the samples are sign-extended into 16 bits, so this could have a bearing on the numbers. I found it helpful to use a program to dump the contents of files to the display in hexadecimal so I can see the actual numbers in the DATA chunks of WAV files. Hope this helps.

Good Luck,
Greg Hansen
 
It turns out a made a small mistake in how I wrote the program. Too small to produce an obviously wrong result but enough to knock my zero point off. I appreciate everybody's help on this.

Greg it is nice too see your interested in the same thing. What type of application are you writing a sound analyzing program for?

thanks

Mike
 
Hi Mike,

Thanks for the reply post, and your interest. Besides enjoying(?) the challenge of Windows programming, I am an amateur musician. I also have an interest in digital signal processing. Combining these interests (programming, music and DSP) has led me to try to develop an application that would help me figure out the notes being played in a given piece of music, using the raw WAV file data. This is not an especially original objective, and the project has been dragging on for a long time. I have read some material on this type of analysis suggesting that the naive approach I am using, which is to simply use a discrete Fourier transform of the the sampled sound, is useless. Nonetheless, I figured I would start with that objective and see if leads anywhere. And it is a useful learning project, with modest enough goals so that I can maintain (sort of) my motivation to keep at it.

If you don't mind me asking, what is your application?

Good Luck,
Greg Hansen
 
To MikeMM and Gregha04756,
here is another amateur musician that has been trying to convert a WAV file into music sheets for a long time. I abandoned Fourier transforms too because a simple simulation showed me that it would not work in case of polyphonic tracks. At the moment I do not have any idea what to do next.
m777182
 
Hi m777182,

I'm afraid, unfortunately, I don't have much insight to share that would point you in the right direction. My main thought was that I might be able to decode some blinding fast Eddie Van Halen solo or such, based on the idea that the loudest instrument masks the others and would contain most of the signal power. From what I have read, this is one of the priciples employed in audio file compression. I thought it might work in this application too, but perhaps not.

The site I referenced in my previous post, from the Stanford U. Center for Computer Research in Music and Acoustics, seems to be a fairly comprehensive resource on the subject. Have you looked there?

Good Luck,
Greg Hansen
 
You're much better off finding someone with perfect pitch AND who can translate what they hear into sheet music. The human brain and ear are simply much better at this than any algorithm you can come up with.

TTFN



 
Hi Greg and m777182,

I'm writing a sound analyzing program because I have some ideas about how to write some artificial intelligence programs and I want to see if I can write a speech recognition program that actually works well. But before I do that I want to graph certain features of sound files to make certain I'm using the best approach for this. I know there are spectrogram programs out there, but there are certain things I want to look at that they aren't good at.

m777182 I wish I had a good suggestion on how to recognize musical notes but I haven't even looked at what makes a sound sound on-key or off-key or any of that stuff. I've only looked at human speech. The only thing I would suggest is to download a spectrogram program if you haven't already and try to find patterns in musical notes that your program can analyze. Also if you are trying to analyze singing it might help to look at these human speech pages:



It might be a while but when I'm done with the graphing program I could give you guys a copy. I don't think I'm going to try to sell it so be warned it probably won't look like a nice polished program.


good luck,
Mike
 
This is notice to Gregha04756 and MikeMM:
I have made a small step in my endeavours to transcribe Wave files into music sheets. The clue is perhaps the wavelet transformation of may wave file. I do not need to look for ALL frequencies because beyond 16kHZ we normally do not hear and the lowest time interval is about one quarter or perhaps one eight of the measure. So the time slices of one quarter lenght in time domain would be the samples that are statisticaly stationar and over this time range you perform FFTs. Again you are not interested for ALL frequencies but only for those that are near to the "in tune" frequencies. So we look for spectral lines that correspond to elements of the tonal system. I am making further experiments.
regards
m777182
 
I do not see how wavelets (at least in themselves) are going to help solve your problem. From a pure frequency/time standpoint, wavelets and FFTs are similar in that they tell you what frequencies exist in the material and at what time. The difference is in the way they accomplish this goal and their accuracies. FFTs provide a constant resolution at all times/frequencies. Wavlets provide good time/poor frequency resolution at high frequencies, and the opposite at low frequencies.

Dan - Owner
Footwell%20Animation%20Tiny.gif
 
Thanks, Macgyvwers2000, for your comment. The point is that, as you said, FFTs provide a constant resolution over entire time interval and for all frequencies, but I am not interested for all frequences but for those only, that correspond to particular tones of tonal ladder. Secondly, I am looking for dominant frequencies in a small time interval that is equal to the shortest measure (or bar) element I would like to discriminate. Here is the place open to discussion: is it one quarter or one eight or maybe one twelve( in blues, e.g.). I think this is the way how to extract four voices of a chorus in a time slice that correspond to a smallest time interval that interests me.If I succed to extract 4 most dominant spectral lines in the first time slice(let it be one quarter), I can instead of frequencies write down names of paricular tones, like c1,e1,g1 and c2. If in the next time slice spectral lines are c1,e1,a1 and e2, then I conclude that in first half of a bar there were a one half on c1, one half on e1 and a quarter on g1 that moved to a1 and a quarter on c2 that moved to e2. You would argue that two cosecutive spectral pictures do not tell me that voice1 kept on c1 and that it was not voice2 that moved from e1 to c1 in the second time slice. I do not know now how to manage it; however my ear and music experience will interfer at this stage, but I do not exclude the possibility that, like MikeMM pointed, some AI program will recognise the specific color of a particular voice- maybe through higher harmonics. In that case a step of voice1 will be estimated by a simultaneous step of all his specific harmonics. But this will take me some time- after all I am neither a specialist for programming nor for signal processing. These activities of mine are intended to support my hobby efforts to make good music sheets.
m777182
 
The wavelet transform could possibly be of use to you. The kernel of the wavelet transform is a generic form and changing it changes the type of wavelet you form (Harr, Morlet, Meyer, Shannon, etc.) There has been work of forming kernels that produce results similar to 1/3 or 1/12 octave filters and so forth.

So finding the right wavelet to apply might help in finding the particular tones you're interested in.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor