Sample Rate Conversion

GroupDIY Audio Forum

Help Support GroupDIY Audio Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Perhaps because you have excess information... more than one input sample per output sample. The less output samples are averaged weighted sum of the extra input data.

JR

PS: Probably not the "best" explanation ;)
 
I'm looking for an accessible way to understand it. I am not a coder or a DSP guy. Someone I know is hung up on the whole 96 not being an even multiple of 44.1 thing, and says " Mix two frequencies you get side bands. That is basic FFT 101, and there will be audible artifacts."

A DSP friend of mine said this:

"With modern sample rate conversion you put the audio into a virtual sample rate that is evenly divisible by both 96 and 44.1.

It's like multiplying fractions with different denominators. The actual numbers don't matter. They're just numbers and we have mathematics to take care of this. It's not an issue when you have enough orders of magnitude of precision, which we do."

I'm just looking for someone to amplify these ideas.
 
You will get artifacts whether it's even or not. A lot will come down to the filtering and implementation. If it's offline then even or not, both will get upsampled as an intermediate step. Theoretically it shouldn't matter, but theory and practice not always the same, not all algorithms are good ones.

In the end the best sample rate conversion is often a D/A/D trip through another converter.
 
In the end the best sample rate conversion is often a D/A/D trip through another converter.
But a modern converter does just that on the inside: Oversampling (upsampling -> filtering ->downsampling). A legacy converter using analog filtering only would be a different matter, but it will be less transparent in terms of phase shift and frequency response.

The quality of SRC relies on the sophistication of the filter used (the number of taps more specifically) and the time window availible for processing. That makes it a challenge for real time applications. An offline process in software can always be made to be superior to a real time algorithmic implimentation.

By far the best SRC IMO is finalcd, a freeware comand line program using 1.5 million taps:

http://www.sonicillusions.co.uk/finalcd.htm
They recently added upsampling capabilities and I made a test comparing up/downsampling from 44.1khz to 88.2khz vs. 44.1khz to 96khz. Interestingly, the file sampled up to 96khz and back down to 44.1khz was closer to the source in terms of cancellation (less residual in the phase inverted sum), due to a higher filter cutoff being possible at the higher sample rate.

And yes, even multiples don't matter, John's explanation gives a good idea why.
 
It's subjective whether artifacts are audible or not.
This guy is claiming he could hear 'metallic grinding noises' in the examples he heard.

I told him that he was either working near an auto body shop with the windows open, or something was very wrong because I've resampled 1000 things over the years and haven't heard metal grinding.

In a double blind test, if you can't identify which is which, then you can't hear it.

My guess is he knew which was which when he was listening, and like in all of these "audiophile" listening tests, his brain just lied to him to confirm his bias.
 
This guy is claiming he could hear 'metallic grinding noises' in the examples he heard.

I told him that he was either working near an auto body shop with the windows open, or something was very wrong because I've resampled 1000 things over the years and haven't heard metal grinding.

In a double blind test, if you can't identify which is which, then you can't hear it.

My guess is he knew which was which when he was listening, and like in all of these "audiophile" listening tests, his brain just lied to him to confirm his bias.

You have to take people's descriptions of what they hear (real or otherwise) with a good dose of scepticism.
I was once part of a development team putting a digital audio box out for feedback.
From one studio or somewhere we got the feedback that they could hear "Glass in the Room" IIRC.
 
I make a point of not arguing with people about what they say they hear on the WWW. It is almost impossible to prove with any statistical significance without a serious time investment in controlled (double blind) listening test.

I do listen to comments (a little) and sometimes if lucky you can figure out what they are hearing.

[tmi] One time several years ago on another forum a respected member claimed that he could hear a sonic improvement from using a voodoo speaker cable. Digging a little deeper it turned out that his voodoo cable was smaller gauge than even typical zip cord used for DIY speaker wires. His smaller gauge funny wire had more resistance that was interacting with his speaker's load impedance that changed with changing frequency. We calculated that his combination was responsible for a major fraction of a dB top octave boost. This could be just barely audible in close listening comparisons. His conclusion that the sound difference was an improvement was based on the price of the funny wire vs zip cord. [/tmi]

===

frequency aliases are a function of mixing different frequency signals together. I have seen (heard) aliasing from mixing signals with different but close sample rates due to clock frequency residuals beating with other clock frequency residuals.

Sample rate conversion is pretty much a math calculation... resampling 96k down to 44k means we have 2+ samples of 96k input data, for every 2 sample of 44k output data. There are probably multiple approaches to do this. The quality of the result from a simple calculation depends on resolution of parsing the odd slivers of extra samples to combine with the two whole sample from each input with proper weighting.

Probably still not a very accessible explanation.

JR
 
when when you downsample from 96k to 44.1K (96 not being an even multiple of 44.1), why there are no audible artifacts?

I guess you could approach that two ways.
The first would be that there are no artifacts for the same reason there are (usually) no audible artifacts when you sample analog audio at 44.1k, because the system properly bandlimits the input to below the Nyquist criterion before generating the output samples.

The second would be to follow up on the implicit assumption about downsampling by 2 being mathematically simpler. It is, but I suspect the questioner doesn't really understand the math involved, so leaps to the conclusion that difficult math must mean errors large enough to hear. With modern processors the only limit to all artifacts being more than 140dB below full scale is if you need super low latency or need to restrict memory usage for some reason.

Have you (or the originator of the question) seen the SRC comparison page?
SRC comparison
The sweep charts can be a little confusing to read the first time you see them. The default chart uses a sine wave sweep with frequency increasing bottom to top, and amplitude indicated by color. The input signal is the bright yellow curve up the middle, and any aliasing shows up as darker colors. You can also choose from a drop down menu 1kHz single frequency tone at low and high amplitude, frequency response, phase response, and view the impulse response. It looks like the default converters shown right now are essentially faultless, so it is instructive to change one of the views to an older lesser quality converter, or a converter that has to work at low latency in real time to see what artifacts look like in the sweep chart or the 1kHz.

Mix two frequencies you get side bands.

So he(?) must have some exposure to signal processing terminology, since mixing in the audio terminology sense obviously does not produce sidebands. Mixing in the RF sense would probably be thought of as multiplication by a lay person (as opposed to the addition you would get with an audio mixer).
I guess you could ask why he thinks that SRC involves a mixing operation, although that might go down a rabbit trail of sampling as convolution that you don't really want to get in to.

Really, that SRC comparison page, or doing the equivalent yourself, should be enough to settle it. Maybe just start there, suggest he make some test tones at 96k and convert them to 44.1k himself, and look at an FFT of the result. There are some caveats around setting up the FFT properly, but it should be obvious that you don't get sidebands with a decent quality SRC.
 
I guess you could approach that two ways.
The first would be that there are no artifacts for the same reason there are (usually) no audible artifacts when you sample analog audio at 44.1k, because the system properly bandlimits the input to below the Nyquist criterion before generating the output samples.

The second would be to follow up on the implicit assumption about downsampling by 2 being mathematically simpler. It is, but I suspect the questioner doesn't really understand the math involved, so leaps to the conclusion that difficult math must mean errors large enough to hear. With modern processors the only limit to all artifacts being more than 140dB below full scale is if you need super low latency or need to restrict memory usage for some reason.

Have you (or the originator of the question) seen the SRC comparison page?
SRC comparison
The sweep charts can be a little confusing to read the first time you see them. The default chart uses a sine wave sweep with frequency increasing bottom to top, and amplitude indicated by color. The input signal is the bright yellow curve up the middle, and any aliasing shows up as darker colors. You can also choose from a drop down menu 1kHz single frequency tone at low and high amplitude, frequency response, phase response, and view the impulse response. It looks like the default converters shown right now are essentially faultless, so it is instructive to change one of the views to an older lesser quality converter, or a converter that has to work at low latency in real time to see what artifacts look like in the sweep chart or the 1kHz.



So he(?) must have some exposure to signal processing terminology, since mixing in the audio terminology sense obviously does not produce sidebands. Mixing in the RF sense would probably be thought of as multiplication by a lay person (as opposed to the addition you would get with an audio mixer).
I guess you could ask why he thinks that SRC involves a mixing operation, although that might go down a rabbit trail of sampling as convolution that you don't really want to get in to.

Really, that SRC comparison page, or doing the equivalent yourself, should be enough to settle it. Maybe just start there, suggest he make some test tones at 96k and convert them to 44.1k himself, and look at an FFT of the result. There are some caveats around setting up the FFT properly, but it should be obvious that you don't get sidebands with a decent quality SRC.
Thank you! That contribution was SUPER helpful.
 
With a properly band-limited signal sampled at more than twice the bandwidth, you can reconstruct EXACTLY the band-limited signal again. We'll ignore amplitude quantisation here but remind people that quantising in time is not exactly analogous to quantising in amplitude.

So there is no reason that you can't change sample rates all you like, so long as the Nyquist Criterion remains unviolated.

Yes, real-time algorithms may have to take short-cuts due to latency, processing speed, or storage requirements, but a properly designed and implemented off-line algorithm should be able to do this re-sampling with an integer ratio or not with degradation well below the noise floor.

One way to call your friend out would be to point out that the same upsampling, filtering and decimating algorithm will be in play for the re-sampling, whether it is an integer ratio or not. Properly written software won't simply discard alternating samples for 96 to 48 conversion. It will do the same thing as 96 to 44.1 conversion, just with different coefficients. So, if they claim to hear a difference, they are imagining it.

As an aside / anecdote: I used to work in Digital TV (the algorithm design and implementation). Tests showed viewers noting improved picture quality when the picture quality remained the same but the sound was improved. Our senses are massively subjective and also subliminally synaesthetic.
 
Great info!


As an aside / anecdote: I used to work in Digital TV (the algorithm design and implementation). Tests showed viewers noting improved picture quality when the picture quality remained the same but the sound was improved. Our senses are massively subjective and also subliminally synaesthetic.

Oh yes. Our brains are easily fooled.
 
Seems the placebo effect we see in medical which is very real and proven scientifically has a close relative in the auditory department , our preconceptions colour what our ears and our brains tell us .
 
RE the side bands not happening in sample rate conversion. Could you expand on that a bit? Something I could present him with to help him understand better?

Well the basic point is that src does not involve mixing the sampling frequencies.
I'll decline to expand further. Not to be unhelpful but it's not really feasible in a forum context, something that tbh I would want payment for, and it's already a well documented topic.
I will suggest the reference below for a fuller explanation of the concept, theory and implementation.

https://www.kmraudio.com/art-of-digital-audio-john-watkinson.php
 

Latest posts

Back
Top