Often discussed subject. Advocates of the "no-difference" expose the same old argument: "digital audio is continuous without steps. The only thing bit depth affects is the level of background noise."
This is simply not true. Not very wrong either, but it has to be put into context, which is the mathematical concept of continuous signal, for which this affirmation is true. There is a formula that qualifies the amplitude error of a sample as a function of the Signal frequency to sampling frequency (as the ratio increases, the error increases too, for 100% error when signal frequency=SR).
Put differently, for less than 1% error, the SR must be >22times the signal frequency. But this formula is for a single occurence of the signal; error decreases significantly with the number of samples. Not too dissimilar with the constraints of FFT.
Let's not forget that audio is not a continuous signal.
The subject is made a little more complex by the fact that most converters are Delta/Sigma, which again involves dealing with strings of data that are interrelated.
Many people have conducted their experiments on the subject; most reasonable reviewers (I exclude audiophools) come to the same conclusion. There is no perceptible difference between 16 and 24 bits as a release format, except when dealing with poorly recorded material or when extensive processing is required. Single Speed (44.4 and 48k SR) is inadequate, mostly because of the constraints on the anti-alias and recovery filters; Double Speed is more than adequate (in fact, if they had a time machine, many designers would chose 60-64k). Quad Speed has no operational advantage, though it is of some use to those who do measurements.