> What is the absolute reference
I'm a practical man, descended from pig-farmers. If some lab-guy says "This is the reference", that's good enough for me. Especially because a lot of that work was done around Bell Labs, back in the days they were on their toes.
But without being rigorous:
The "absolute reference" can be anything you want. Furlongs (particle displacement) per fortnight, if you like those units. Watts of acoustic power per seat is natural in PA design. Pressure (variation) references non-acoustic units like pounds/grams and square-yards/meters, which references you to fundamental SI standards (the weight and stick in Paris) but doesn't have a natural acoustic meaning.
Being earthlings, we have a "constant" air pressure all around us: Atmospheric Pressure. Very good reference because it is the same air that our sounds usually flow through. Also it sets a limit on sound pressure: it is possible for a sound to have positive peaks higher than air-pressure, but impossible for a sound to have negative peaks less than zero pressure. For linearity, our sounds must peak far lower than air pressure. In fact loud music peaks more like 1/1000 of air pressure. Air pressure is usually called "Bar" (for barometer) so loud music swings from 1.001 Bar to 0.999 Bar. And yes, barometric pressure changes day to day, often over 1/100 Bar. That hardly afffects our 0.001 Bar Peak sounds, and someone has established a Standard Bar, some kind of average pressure. So while the reference Bar is an arbitrary number of Paris sticks and weights, it is close-enough to everyday air pressure to be convenient.
Being humans, we have another reference: the softest sound an "average" human can hear. We can argue about who is average, and it is certainly true that the threshold varies with frequency. But many tests suggest a nice convenient round number which has been standardized as 0 dB SPL. There may be rare cases where you must deal with softer sounds, but mostly our real world starts at +10 or +20 dB SPL.
We could just as well specify a Particle Velocity instead of Pressure. As long as we work in just plain air, it does not matter which we use: the ratio of velocity to pressure is constant within easily defined limits.
But what do you use to measure sound?
There is the Rayleigh Disk, a bit of cardboard hung on a thread. If hung at an angle to the flow of sound, it wants to twist in line with the sound. The torque is proportional to properties of air that can be measured witn non-sound techniques (mass, density, viscosity). The torsional stiffness of the thread can be measured mechanically. So by measuring how much the disk turns, you know the sound field strength. However this is very hard to do.
Many transducers are reversible: can be used either as speaker or microphone. Wire headphones or loudspeakers into a mike-amp: they work as mikes. A dynamic mike is obviously a little loudspeaker, and one particular model has an input power rating. Drive audio into the electrodes of a condenser mike, it speaks.
It may sound a little like the game with one pea under three cups, but you can use three reversible transducers, not all identical, to get an absolute calibration by using them alternately as speaker or mike and swapping them around A-B, A-C, B-C, etc. This works even if you do not know the sensitivity or frequency response of any of the transducers. However because all wide-range tranducers are inefficient (like 1% or less) and you are measuring through two of them (0.01% total efficiency) and the input power must be well below any hint of overload effect, it is a pretty fussy measurement.
A variation of this is scaled microphones. Olsen made a set of ribbon mikes that were identical as possible, except all dimensions were scaled 1X, 2X, 0.5X. The exact calibration of each one is not precisely known. But it is easy to show that they all have the same shape of frequency response, only shifted up or down 0.5X or 2X in frequency. By comparing their outputs in an uncalibrated sound field, you can derive the exact shape of the frequency response.
There is a crude-looking but very predictable way to generate large (easily measured) sound pressures. Take the spark plug out of an engine, seal a mike in the hole, and crank the engine. The piston going up and down creates variation in pressure. Ignoring details like leaks, the change in pressure is a function of the compression ratio, which you can measure with a ruler. The acoustic pressure inside an engine cylinder is about 10 Bar, far too loud. But a Pistonophone uses a small chamber with a very small piston to generate exactly calculable acoustic pressure variations in the chamber. It has several problems at high frequencies, but there is nothing to drift out of calibration, and is still used as an absolute calibrator around 100Hz.
One of the books I mentioned in the old place is all about this problem. The theory is clear, but some theorists go over every bit and make sure all the "i"s are dotted. Practice is always another matter, and some acoustic labs have worked hard on that end. It is quite possible to calibrate the mid-band sensitivity to 0.01dB absolute precision (0.1% error relative to Paris weight and stick) and get the frequency calibration of a 1" lab-mike to about 0.02dB up to 1KHz, better than 0.1dB out to 5KHz.
> What is the absolute reference for testing in an anechoic chamber?
Ah, that's very practical and very easy. You buy a set of measurement mikes and preamps. For very soft sounds you need the big 1" mikes (actually the 0.5" mike does about as well now). The 1" gets a little whacky above 6KHz, the 0.5" above 12KHz, so if you need sub-dB precision to the top of the audio band you buy a 0.25" capsule too. Measurement mikes come with individual calibration of sensitivity, frequency response, directional effects, and error-bands so you know where they will be 0.1dB accurate and where they can't be trusted better than 1dB. They hold calibration very well, and if you are ever in doubt you can buy a calibrator, or send them to the factory, or to any of several private and national labs that provide calibration services. How the factory and labs get the numbers is not really the user's concern. They can be trusted because anybody who gets into the mike calibration racket is more interested in problems than profits. The kind of guys who, if two tests disagree, will study it to find out why.