To stop corrupting the thread about sum bus technology some recent comments have triggered this veer.
Back in the 70s working with splicing together discontinuous samples of pitch shifted speech I had an epiphany about how audio can "sound" dramatically different than you would expect from looking at scope traces. When pitch shifting audio we grab X mSec of audio then time compress it to fit into less time to pitch shift up, or expand it out over longer time to pitch shift down. Clearly when trying to reassemble these samples in real time there will be overlap (too much audio to fit in original time space), or gaps when pitch shifted samples take up less time than available.
In the case of pitch shifting up, how we splice the valid audio with the dead gaps between samples can cause audible perturbations. Pitch shifting down gives us too much information to fit in the time window neatly and you can't just let them randomly overlap and sum. There are different strategies for dealing with both. For waveforms with redundant information the excess data can be discarded, but the starts and stops must be carefully managed for minimal artifacts. For pitch shifting up, some data can be repeated to fill excessive gaps but again this must be done carefully. I am just scratching the surface of this and there are numerous other factors like optimal sample size, etc.
Back in the day, the gold standard for pitch shifting was a "rotating head" tape machine. Kind of like the technology used in VCR tape machines the rotating head allows the relative tape speed seen by the head to differ from the true tape speed. These audio rotating head machines used 4 heads spaced around the head assembly that seamlessly spliced together audio samples as they approached or withdrew from contact with the tape. In a little serendipity, the nature of how tape heads decode magnetic signals, the LF signals were received before the HF content. This delivered a smoother blending of samples.
The small company I was working for invented a pitch shifter based on BBD delay. By clocking in an audio sample at one clock rate, and clocking it out slower or faster changes the apparent pitch. BBD delay chips are too short for simple bi-frequency pitch shift, but clocking the BBD with constantly ramping clock frequency (faster or slower) delivered the desired pitch shifts with more useful sample sizes. This was too crude for music, but surprisingly effective for speech with the target market of speeding up talking book recordings for blind people.
====
For TMI about managing clicks another old design trick is to use HF pre/de-emphasis. By boosting the HF content before the switch, and restoring it to flat, after the switch effectively rolls off just the HF content of the "click". Of course there is no free lunch. This pre-emphasis comes out of headroom but HF content in complex waveforms is generally much lower amplitude than LF content, leaving room for pre/de-emphasis.
Fourier analysis of the gain steps identifies HF spectral content, but I am still not a mathematician.
Let the veer continue..
JR
Back in the 70s working with splicing together discontinuous samples of pitch shifted speech I had an epiphany about how audio can "sound" dramatically different than you would expect from looking at scope traces. When pitch shifting audio we grab X mSec of audio then time compress it to fit into less time to pitch shift up, or expand it out over longer time to pitch shift down. Clearly when trying to reassemble these samples in real time there will be overlap (too much audio to fit in original time space), or gaps when pitch shifted samples take up less time than available.
In the case of pitch shifting up, how we splice the valid audio with the dead gaps between samples can cause audible perturbations. Pitch shifting down gives us too much information to fit in the time window neatly and you can't just let them randomly overlap and sum. There are different strategies for dealing with both. For waveforms with redundant information the excess data can be discarded, but the starts and stops must be carefully managed for minimal artifacts. For pitch shifting up, some data can be repeated to fill excessive gaps but again this must be done carefully. I am just scratching the surface of this and there are numerous other factors like optimal sample size, etc.
Back in the day, the gold standard for pitch shifting was a "rotating head" tape machine. Kind of like the technology used in VCR tape machines the rotating head allows the relative tape speed seen by the head to differ from the true tape speed. These audio rotating head machines used 4 heads spaced around the head assembly that seamlessly spliced together audio samples as they approached or withdrew from contact with the tape. In a little serendipity, the nature of how tape heads decode magnetic signals, the LF signals were received before the HF content. This delivered a smoother blending of samples.
The small company I was working for invented a pitch shifter based on BBD delay. By clocking in an audio sample at one clock rate, and clocking it out slower or faster changes the apparent pitch. BBD delay chips are too short for simple bi-frequency pitch shift, but clocking the BBD with constantly ramping clock frequency (faster or slower) delivered the desired pitch shifts with more useful sample sizes. This was too crude for music, but surprisingly effective for speech with the target market of speeding up talking book recordings for blind people.
====
For TMI about managing clicks another old design trick is to use HF pre/de-emphasis. By boosting the HF content before the switch, and restoring it to flat, after the switch effectively rolls off just the HF content of the "click". Of course there is no free lunch. This pre-emphasis comes out of headroom but HF content in complex waveforms is generally much lower amplitude than LF content, leaving room for pre/de-emphasis.
Fourier analysis of the gain steps identifies HF spectral content, but I am still not a mathematician.
Let the veer continue..
JR