Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations KootK on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

SNR estimation

Status
Not open for further replies.

erida

Computer
Aug 12, 2006
4
Hi all,

I would really appreciate your ideas on something...

Ok, I am building a speech recognition system which uses a mic array to capture the sound. The signal coming in is not only speech but it's also affected by noise and room reverberation.
I need to see what's the effect of reverberation alone so I mixed the speech signal from a close-talking microphone (so no noise and reverberation there) with some noise I cut out from the mic array recording.
So, if there is a difference in how the system deals with the mic array signal and the close-talking+noise signal, it's got to be due to the reverberation.

My question is simpler than that.

I need to have the same noise to signal ratio in both signals. Because even though the added noise in the close-talking+noise signal is as loud as in the mic array signal, the speech in the mic array signal is quieter. So, just adding the noise is not quite right.

Do you know what software to use calculate the signal to noise ratio in both signals? I was thinking I could just take the average db value of the whole mic array signal, then the average level for the noise and then just subtract it from the whole, in order to get the level of the speech signal alone.
Does that make sense? What software do you suggest using?

Thank you in advance,

Best regards,

Erida
 
Replies continue below

Recommended for you

I didn't understand your question (but it might be just me).

Are you planning to pre-process the audio to remove or reduce the reverberations. New Digital TV tuners do something similar to eliminate multipath before it gets into the demodulator. Or long distance telephone Echo Cancellers. Both are the same sort of concept as removing reverberations.

With all the processing power required for speech recognition, it would seem that removing or reducing reverberations would be trivial in comparison.
 
Hi,

First, I need to thank you for replying. Helping out somebody you don't know is a really big thing.

I think I was not very clear in my first post.

Here's a bit more about my thing...
A speech recognition system does fine when the speaker uses a close-talking microphone. But it does terribly when the mics are far away from the speaker, because noise is added to the original signal and the room response (the reverberation) causes something like a smearing in its spectrum.

In my experiment, speech was picked up by the mic array and a close-talking microphone at the same time in the same realistic room conditions. The c-t mic used was one of those noise-cancelling head-mounted mics, so it didn't pick any significant amount of reverberation or noise. What I am testing is exactly the effect of reverberation alone in the system's performance, and the close-talking mic recordings are used as reference.
So what I was expecting to get was something like a 90% of accuracy in the close-talking condition (no noise, no reverb), and say 30% accuracy in the mic-array condition (reverb+noise). What I was trying to do was to test what is the performance when you have a recording with just the noise and the speech without the reverberation.
So, if I got 70% of accuracy in the speech+noise only condition, I'd say that reverberation screws it up by 40%.

So I need to make a new set of wav files that contain the speech from the c-t mic and the noise from the mic array. But it has to have exactly the same Signal to Noise ratio, in other words, the noise has to be as loud (which it is) and the speech as loud.

My question is how I measure the exact level in the signal (noise or speech) from a wav file. Is there some software I can use?

Thanks again!


Regards,


Lela



 
There's tons of freeware audio tools available for the PC. You should be able to find tools to mix two wave files into one. You should be able to measure the relative levels. What you're less likely to find is a program that can look at a wave file containing noisy speech and tell you the SNR.

So you should probably take the approach of making three files: clean speech, only the reverberations, and just noise. Then using the mixer SW would should be able to prepare wave files with whatever SNRs you want.

Freeware sites:
Nonags
Pricelessware
...there's tons...

To answer your question directly: I think that you can easily BUILD wave files with the required SNR. But I don't know of any way to easily MEASURE the SNR of recorded files.


Some other common sense advice:

Humans use stereo (the 'cocktail party effect') to pull speech out of very noisy environments. A billion years of evolution isn't likely to be wrong.

The importance of stereo can be seen with people with hearing aids, especially ONE hearing aid. Hopeless - might as well use email instead of trying to talk with them in a noisy environment.

If you (the human) can't make out the speech in the (stereo) recording, then don't expect the software to do any better. I'll bet that in a hundred years, the humans will still have the advantage over software by several dB.


 
Thank you very much for the tips!
I'll try one of those tools and see what I can do.
However, there are some factors that I haven't thought about- e.g. that the noise I'll be cutting out from the mic array recordings will be reverberated too...


I was too very interested in human audition- because, as you said, while humans can easily cope with signal distortion like noise and reverberation, machines cannot.
Binaural hearing is one thing that makes the difference, but there are dozens of other compensation techniques that humans use which remain unknown or even if they are identified by researchers and they try to model them and implement them for machines, they don't necessarily lead to better performance.
Even if you have the luxury of stereo recordings or two hearing aids, there are other issues besides better sound localization or amplification. It all comes down to, I guess, finding a way to increase speech intelligibility for either people or machines.

Anyway, thank you again for taking up your time to reply.

I'd also appreciate it if you had any other suggestions.



Erida

 
There's probably some freeware or paid software to make reverberations. Normally they're used gently for making music.

Thus, you could record clean speech, make a reverberation version in the PC, and record some noise. Then you'd have all three elements to mix together.

As I had mentioned, removing reverberations (a.k.a. multipath, long distance line echo) is a solved problem.

 
This is more information than you provided on your post in another forum.

You said the c-t has no reverberation. That is an incorrect statement. You said it was a noise cancelling type. That could mean that it has at least two elements, one pointed away to pick up mostly noise. This mostly noise will have a good sample of the reverb. The direct portion will have the reverb also but the SNR (for the reverb not the signal!) will be worse.

SNR for speech is poor indicator of anything. Could you explain why you would want to know? Metrics for speach quality are published and are only loosly associated with SNR (a power ratio).

What gives?


jsolar
 

Hi,


Thank you for your responses. I really appreciate it.

To sum up, the idea was to cut out some noise from the mic array recordings and mix it with the c-t signal and put it through the system. But of course I needed the same SNR that the mic array had so to make a valid comparison.

Eventually, what I did was to apply to a number of my wav files a window of 0.2 sec and get the spectral envelope. Then I took the RMS in dB for some vowels and for some noise regions in each file. I did that for both mic array and c-t recordings. Then I calculated the SNR from these samples, and I got a rough idea of how much noise I should add to the c-t files in order to match the SNR of the distant mics.

It seemed a quite reasonable approach to me, but, again, I ignore a whole lot about signal processing.
To my distress, this was very clear, when I put the new (c-t speech+added noise) wav files through my system.
The performance was even worse than that of the mic-array recordings.

Obviously, I'm doing something terribly wrong here.

I was also thinking as the added noise also contains reverb this may be a bigger issue than I had assumed before. So, a second idea is to add gaussian noise to the c-t mic, matching the mic array SNR instead of adding the mic array noise.

Have you got any thoughts on that?

"You said the c-t has no reverberation. That is an incorrect statement. You said it was a noise cancelling type. That could mean that it has at least two elements, one pointed away to pick up mostly noise. This mostly noise will have a good sample of the reverb. The direct portion will have the reverb also but the SNR (for the reverb not the signal!) will be worse."

I made the assumption that the reverb and the noise of the c-t mic is not significant. How catastrophic is that?

Many thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor