subjective listening tests

GroupDIY Audio Forum

Help Support GroupDIY Audio Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
ricardo said:
This all comes under 'objective' measurements which are verified by DBLTs.  The reason for wanting well controlled cone break up, nice directivity etc is cos ... 'our' prejudice from conducting DBLTs is that it leads to speakers that most listeners, from recording engineer to woman in the street, 'like'.  These are also, perhaps surprisingly, accurate speakers.
That still doesn't explain how the results of DBLT's translate into "check the midrange distortion" or "the HF dispersion is too narrow" or whatever issue.
I remain unconvinced of the capability of DBLT's conducted with untrained listeners to orient the works of an R&D team.

No.  An ABC test just asks the victim to rate 3 presentations. 
So what are these three presentations?

An ABX test asks the victim, "which of A or B is closest to X?"  The X is the 'undisputed' reference.
I had a different picture in my head; I seemed to remember that ABX is using two samples (A & B) that are identified as such and presenting X (which is A or B randomly selected), and the question not being necessarily "which one is closest" but "which one is best". Paucity of information on the net doesn't help clearing my head.

I suppose if the test is eg to determine the audibility of phase distortion, the undistorted signal path is the 'hidden reference' .. but it could be repeated. 
How does it apply to loudspeakers? I don't know any loudspeaker that could be considered as an undisputed reference.

Look up AES papers by Fryer for some of our early work on this.
I would, if I didn't have to pay $33 per paper; I hate the idea of giving money to the Plunkett family.

BTW, though I've pimped the 'man in the street' as being more perceptive than most (all?) audiophools & HiFi reviewers, the best (most consistent) people on my DBLT panel were some loudspeaker designers & a couple of recording engineers
Wouldn't that agree with my view that listeners must be trained (or incredible naturally gifted)?

Only 1 HiFi reviewer is in this A team .. out of all UK & US reviewers.
I believe we both agree, as many other members, that HiFi reviewers are as competent as the pollsters who didn't see Trump coming...
 
ricardo said:
We were among the first to do serious work on cone breakup with laser doppler interferometer bla bla and look at directivity over a large frequency range.  Remember we designed and made units including cones, surrounds etc. to go in our speakers.
A loudspeaker cone in break-up mode resembles a drumhead vibrating normally.  ::)
Distortion in speakers is difficult.  I had several cases of speakers with higher measured THD do better than speakers with much lower distortion ... with comments that these seemed very undistorted with a very experienced & perceptive panel.  The important audible distortions in speakers are gross ... but not what most people measure.

I would be careful about making broad generalizations about loudspeaker distortion. LF distortion can sound like more/better LF response. Mid-hi frequency distortion (like from a rub) can sound as bad as we expect.  One large sound reinforcement speaker company, who delivers unusually clean low bass (Danley), has had to teach users that all the extra LF distortion present in typical sound reinforcement loudspeakers is not the correct "normal" sound.  ::)

Another example of distortion influencing listening tests, I did my share of single blind tests between power amp clip limiters, defeated or not, and perceived power output clipped vs limited. The unfortunate result (IMO) was how typical listeners preferred the sound of power amps clipping, vs limited to stay clean. The clipped amps sounded louder (perhaps because they were), and listeners prefer louder, as long as the clipping is modest and not immediately identifiable as distortion.

JR
 
abbey road d enfer said:
That still doesn't explain how the results of DBLT's translate into "check the midrange distortion" or "the HF dispersion is too narrow" or whatever issue.
I remain unconvinced of the capability of DBLT's conducted with untrained listeners to orient the works of an R&D team.
Perhaps I didn't make it clear.  A top speaker company will have done all the work on cone breakup, directivity etc before getting to the DBLT of the complete system (There are different listening tests which are useful & quick on these which I won't go into..  They would have to design & make their own units so I'm referring to box-stuffers .. which is basically what nearly all DIY speaker makers are.

The DBLTs will tweak stuff that doesn't really have an 'objective' answer.  eg perhaps the most basic "What should the frequency response be?"  You might answer 'flat' but is that on treble axis?  In the room?  Some combination?  What room?

This is of course tied up with directivity too.  I coined the phrase Room Interface Profile in
http://www.aes.org/e-lib/browse.cfm?elib=3798
as probably the most important attribute of a speaker.  Various people since then eg Floyd & Olive have tried to quantify and track down this mythical parameter with varying degree of success.

But I can assure you it exists and is important.  Alas I'm no longer an AES member and can't provide copies.  What we did was so simple & obvious that I'm surprised no one had done it before or since.  It is the 'sound' of the speaker in its true environment.

So what are these three presentations?
The presentations are called A, B & C.

I seemed to remember that ABX is using two samples (A & B) that are identified as such and presenting X (which is A or B randomly selected), and the question not being necessarily "which one is closest" but "which one is best".
This is one form of ABX.  There are others.

In your case, the question is "Is X, A or B?"

There's loadsa stuff published by Lipshitz & Vanderkooy and their students who coined the term ABX.

How does it apply to loudspeakers? I don't know any loudspeaker that could be considered as an undisputed reference.
I would never apply 'undisputed reference' to any speaker.  That's why our victims are only asked which one they 'like'.

As I pointed out ad nauseum, the surprising thing is that the man/woman in the street, regardless of his/her musical taste 'likes' the same speakers as the recording engineer who makes his own mikes and insists on using only his recordings ... and the whole gang in between.

(That's assuming the victim isn't deaf of course .. which you quickly (well not quickly but surely) find out in our DBLTs.)

Am I wrong to say speakers 'liked' by everyone are 'accurate'?  Perhaps ... but in my book, they are certainly GOOD speakers.  I still happen to believe that if you design something that sounds 'good', it will help your profits.  Maybe this is naive but we were quite successful doing this for many years.

A loudspeaker cone in break-up mode resembles a drumhead vibrating normally
I won't go into this but the breakup behaviour of a good sounding paper cone is quite different from a good sounding plastic cone.

LF distortion can sound like more/better LF response.  ...
.. loadsa good stuff
..  Another example of distortion influencing listening tests, I did my share of single blind tests between power amp clip limiters, defeated or not, and perceived power output clipped vs limited. The unfortunate result (IMO) was how typical listeners preferred the sound of power amps clipping, vs limited to stay clean. The clipped amps sounded louder (perhaps because they were), and listeners prefer louder, as long as the clipping is modest and not immediately identifiable as distortion.
Agree to all that.

and I've used these audible advantages in my patented Powered Integrated Super Sub technology to make small speakers/amps sound like big ones ...  checked out with DBLTs of course  8)
 
ricardo said:
The presentations are called A, B & C.
I believe I had that figured, on account the test is named ABC.  ;D. So these A, B and C are 3 different samples? So you would compare 3 variations of the same speaker, on account it gives only 33% random error...then why not ABCD, which would ngive only 25% random error? I just don't get it. Statistics never were my forte...

This is one form of ABX.  There are others.

In your case, the question is "Is X, A or B?"
No; the question is "which do you like better?" Let's say a listener says A is good, B is bad, and then presented with A (under the X name) he says it's good, that's one point, if he says bad, it's minus one point. A listener that answers randomly gets zero point at the end of the test (provided it's long enough).
And the actual question may be "concentrate on the midrange on vocals; which one's best?"; then it can be used in the design phase, not only for final assessment.
 
Abbey, you bring up a lot of points .. many of which illustrate why many DBLTs are flawed.

The subject is really too large for a forum but I'll try to answer some of them.  Remember, this is something that worked well for us.  We developed it over nearly 2 decades.  The Fryer & Lee paper was very early on and there were even earlier papers that we wrote .. some aspects of which we discarded as not useful or too difficult.
abbey road d enfer said:
then why not ABCD, which would give only 25% random error?
The short answer is that humans can't sensibly assess more than 3 'items' in a DBLT.

No; the question is "which do you like better?" Let's say a listener says A is good, B is bad, and then presented with A (under the X name) he says it's good, that's one point, if he says bad, it's minus one point. A listener that answers randomly gets zero point at the end of the test (provided it's long enough).
I think you need to re-visit the ABX papers by Lipshitz, Vanderkooy and their gang.  I can't remember the guy who made and sold an ABX box.

And the actual question may be "concentrate on the midrange on vocals; which one's best?"; then it can be used in the design phase, not only for final assessment.
Our experience is that this type of question is the least productive of all.

As I've said ad nauseum, the victim is told NOTHING about what he is listening to or what he 'should' be listening to.  Either the 'fault' you are attempting to assess is important, in which case the test will pick it up ... or its unimportant and no one will comment on it.

I've highlighted several examples of this already ..  eg the 6 ltr box which is praised for 'well balanced bass' .. even by experienced listeners.  Remember, the victim picks his own music .. that which he enjoys & listens to regularly.  In 2 decades of DBLTs, I think I'm the only one who would sometimes pick a Bach organ toccata ..  as I do enjoy them.

This leads to our 'prejudices' in design.  I've mentioned bass and IMHO, I think I can 'design' this feature into  certain types of speaker.

Another, perhaps surprising feature, is that midrange quality, particularly on vocals, is perhaps the BIGGEST factor for 'like' and this doesn't vary with the type of music the victim listens to.  The heavy metal fan finds it as important as the classical recording engineer.
 
ricardo said:
Abbey, you bring up a lot of points .. many of which illustrate why many DBLTs are flawed.

The subject is really too large for a forum but I'll try to answer some of them.  Remember, this is something that worked well for us.  We developed it over nearly 2 decades.  The Fryer & Lee paper was very early on and there were even earlier papers that we wrote .. some aspects of which we discarded as not useful or too difficult.The short answer is that humans can't sensibly assess more than 3 'items' in a DBLT.
I think you need to re-visit the ABX papers by Lipsh*tz, Vanderkooy and their gang.  I can't remember the guy who made and sold an ABX box.
IIRC it was David Clark.
Our experience is that this type of question is the least productive of all.

As I've said ad nauseum, the victim is told NOTHING about what he is listening to or what he 'should' be listening to.  Either the 'fault' you are attempting to assess is important, in which case the test will pick it up ... or its unimportant and no one will comment on it.

I've highlighted several examples of this already ..  eg the 6 ltr box which is praised for 'well balanced bass' .. even by experienced listeners.  Remember, the victim picks his own music .. that which he enjoys & listens to regularly.  In 2 decades of DBLTs, I think I'm the only one who would sometimes pick a Bach organ toccata ..  as I do enjoy them.

This leads to our 'prejudices' in design.  I've mentioned bass and IMHO, I think I can 'design' this feature into  certain types of speaker.

Another, perhaps surprising feature, is that midrange quality, particularly on vocals, is perhaps the BIGGEST factor for 'like' and this doesn't vary with the type of music the victim listens to.  The heavy metal fan finds it as important as the classical recording engineer.
Re: you comment about frequency response on axis, vs response into the room, the much maligned bose 901 was the poster boy for response into the room, and had some appreciation from classical music fans (I don't think flat was one of the several EQ settings  :eek:  . )

JR
 
Ricardo,
Here's the most relevant piece of info I could find on the subject.
Does the ABC method you employed fits with the description of a fixed identified reference?
According to these descriptions, both methods rely on the existence of an unquestioned reference (the original signal vs. a copy).
Our method, which is typically ABX with a different questionnaire has worked well for us. Sometimes we do a real ABC tests (3 different speakers) as a preselection for the ABX.
 
JohnRoberts said:
Re: you comment about frequency response on axis, vs response into the room, the much maligned bose 901 was the poster boy for response into the room, and had some appreciation from classical music fans
Relying exclusively on subjective tests and implicating the mktg dept induces aberrations such as considering a Bertagni a good speaker.
 
JohnRoberts said:
IIRC it was David Clark.
Name rings a bell.  I think I've corresponded with him in Jurassic times.

you comment about frequency response on axis, vs response into the room, the much maligned bose 901 was the poster boy for response into the room, and had some appreciation from classical music fans (I don't think flat was one of the several EQ settings  :eek:  . )
'Flat' under what conditions?

Certainly the anechoic chamber test in Fryer & Lee says flat is good in an anechoic.  Alas not many of us listen in anechoic chambers.

Besides, there is BBC research that shows stereo in an anechoic is TERRIBLE.  It improved dramatically when they put a plywood floor down between speakers & listener.

My Room Interface Profile is just admitting I don't know EXACTLY what it is except it has to do with frequency responses in ALL directions .. but I don't think Floyd, Olive etc and da 'midrange step' pseudo gurus know either.

Abbey, I can't see your attachment.  No.  My ABC tests don't identify ANYTHING .. including whether you are listening to speakers, electronics, cables (which are in Fryer & Lee  8) ) mains cables ...  ;D

I'd be interested in how you operate your ABX test and your questionnaire.  Do you use David Clarke's box?

What is a Bertagni?  BTW, not all Mktg people are deaf but often ... what they 'like' blind is NOT what they say is good sighted

We found many highly touted speakers which crapped out in DBLTs .. even with their most rabid fans.

On a number of occassions, we had rabid fans (reviewers) carefully set up their favourite speakers and certify that they were performing to their best abilities before drawing the curtain and putting these carefully set up favourites against our own cheapo stuff.  :)
 
ricardo said:
Abbey, I can't see your attachment. 
It should be visible now. The subject of attachments not appearing is currently under investigation.


No.  My ABC tests don't identify ANYTHING .. including whether you are listening to speakers, electronics,
cables (which are in Fryer & Lee  8) ) mains cables ...  ;D
OK, if I understand correctly, it's just three different samples and the question is "which" is "best" ?... Fair enough.


I'd be interested in how you operate your ABX test and your questionnaire.  Do you use David Clarke's box?
No. We use a multi-output digital processor as a switcher and an operator who acts as randomly as humanly possible. A and B are always available at the request of a listener; then the human switcher will present any one of A or B. There's always a mute transition even if X is identical to the last selected sample.
The question is "Which one is "best" ?" but very often with a a focus on a specific aspect of performance;  it may be "Now we're going to concentrate on vent noise, or how it craps out under amp clipping, or how the intelligibility is under stress".
We are very much concerned about the performance under abuse of our systems, because that's normal operation for them.

When we try to assess our products against competitor's, the question is a little different, The listeners are trained, so they will readily focus on some aspects of performance, not necessarily the same for all listeners.


What is a Bertagni? 
Hyped polyplanars. Just google" bertagni speakers".
 
ricardo said:
Name rings a bell.  I think I've corresponded with him in Jurassic times.
'Flat' under what conditions?

Certainly the anechoic chamber test in Fryer & Lee says flat is good in an anechoic.  Alas not many of us listen in anechoic chambers.

Besides, there is BBC research that shows stereo in an anechoic is TERRIBLE.  It improved dramatically when they put a plywood floor down between speakers & listener.
I am surely repeating myself but loudspeaker/crossover design is fraught with multiple compromises.  Response on axis, vs total response into the room is one of those.

Rather than picking one to prefer, I will lump on that all hifi reproduction is a huge compromise based on psycho acoustic manipulation. Modern technology is catching up to a point were we can address more variables, but we will always (probably always) fail to agree on the exact variables and targets to manage.

It is remarkable how much enjoyment we can get from these flawed playback technologies now. Probably even more, the less closely we inspect them.

JR
 
 
ricardo said:
Besides, there is BBC research that shows stereo in an anechoic is TERRIBLE.
Anything in anechoic is terrible. Loosing dimensional cues is unnerving.


  It improved dramatically when they put a plywood floor down between speakers & listener.
Adding reverb to a signal is known to be  pleasant (why is a big question, I'm not sure it could be attributed to acquired taste); could it be the simple reason?
 
ricardo said:
'Flat' under what conditions?
Big issue! Anyway I'm not so much concerned about flatness per se, because all our systems are used with quite a significant dose of electronic EQ, most of it being user-controlled. I'm more concerned with the smoothness of the frequency response, assessing how the system is "EQ-able". Many systems are not, they exhibit accidents in the response that cannot be EQ'd, not because they are non-MP, but just because the accidents don't exist in all directions. So, yes, directivity is also a major concern.
I wouldn't probably be a good candidate for DBLT's because, when I listen to a system, my first question is "could I work with it?" and the answer is quite often "yes". That's my answer when asked how the systems I designe fare, so I leave it to the mktg dept to search the dictionary for dithyrambs.
 
Abbey, I wish I could send you a copy of Fryer & Lee.

In the original version of that test, there IS an Absolute Reference.

But hardly any of the faults that make a speaker "sound like a speaker" can be heard in that test.

Transferring the test to a domestic room. immediately reveals the AUDIBLE faults that make a speaker sound like one instead of the 'real thing'.

I don't think many (any?) people have grasped the full implications of this ... or even my last sentence.  8)

It manifests itself in the amount of work done on stuff which have little or no effect on the sound.
 
ricardo said:
Abbey, I wish I could send you a copy of Fryer & Lee.

In the original version of that test, there IS an Absolute Reference.

But hardly any of the faults that make a speaker "sound like a speaker" can be heard in that test.

Transferring the test to a domestic room. immediately reveals the AUDIBLE faults that make a speaker sound like one instead of the 'real thing'.

I don't think many (any?) people have grasped the full implications of this ... or even my last sentence.  8)

It manifests itself in the amount of work done on stuff which have little or no effect on the sound.
To go further off down this rabbit hole, the obvious standard reference for audio would be a real acoustic source, like a human singing*** or small ensemble performing. AFAIK such trials have been mostly informal comparisons of recording mediums or perhaps audio path technology (the dreaded digital) in studios. "Is it live or memorex?  ;D

A classic marketing promotion was to place a small group of performers on stage in a concert hall (IIRC symphony hall in Boston was used for a speaker demonstration by AR decades ago). The trick is to have the performers playing, then without announcing it switch to playback through loudspeakers located on stage coincident with the performers. The speaker output will get the same acoustic shaping as the live performers from the hall sound. Unless you are sitting up front the dominant sound field in a concert hall is reverberant. (This trick may have been used to pimp recording tape too).

Getting back to speakers, the anechoic chamber might isolate the sound source from room effects so seem better for strict a/b comparisons, but how speakers interact with the room (off axis response, coupling, etc) matters, so this is more complicated than that.

JR

**** One relatively successful high end audiophile equipment provider, back in the day included the step of dialing in each customers system in place (using multi-band graphic EQs). His wife was a singer, so he used recordings of her singing voice that he was intimately familiar with to EQ the system.  Clearly very expensive and hard to mass produce, modern technology might be able to do this a bunch cheaper.
 
JohnRoberts said:
To go further off down this rabbit hole, the obvious standard reference for audio would be a real acoustic source, like a human singing***
The BBC use someone reading from a script.  The problem is that their 'reference' recording is (was) done with AKG414 with its known faults.

We were involved in designing a small active monitor for them and I was most impressed by their assessment methods.  The guy who read the script was in their R&D Dept so we could do a real (sighted) 'live vs recorded' comparison.  I think the recording was part of the very first EBU Test CD.

This leads to the classic BBC sound which IMHO, I can pick up blind on BBC designed speakers.  The voice test was their gold standard for accuracy.

A classic marketing promotion was to place a small group of performers on stage in a concert hall (IIRC symphony hall in Boston was used for a speaker demonstration by AR decades ago).
Actually it was Gilbert Briggs of Wharfedale.  He did it first in the Royal Festival Hall and invited the Queen.
https://en.wikipedia.org/wiki/Gilbert_Briggs

I had the original tapes they used.  The old man cheated.  They used a lot of EQ  ;)

IIRC, the AR stunts were on a much smaller scale.

Getting back to speakers, the anechoic chamber might isolate the sound source from room effects so seem better for strict a/b comparisons, but how speakers interact with the room (off axis response, coupling, etc) matters, so this is more complicated than that.
This is the whole gist of Fryer & Lee.  The anechoic chamber test DOESN'T show up most AUDIBLE SPEAKER DISTORTIONS.  We probably did more work on Audible Speaker Distortions and what they sound like than anyone else.

Simply moving the whole shebang into a domestic room immediately brought up the Audible distortions and the speakers could be identified by those who were very familiar with them.

This stuff was important to me cos it highlighted  stuff that was audible and stuff which was already 'good enough'.  It colours everything I design.  Though I'm as prone to chasing 1ppm THD in an amp as any of the amp gurus here,  for me its just wanking.

I'd happily dump that goal if that would help reduce stuff that was audible.  My real specialty (apart from using DBLTs for design) is integrating speakers & amps to produce  better sound .. not lower THD or other stuff that is unimportant.
___________________

Abbey, you mention asking your panel to 'concentrate on midrange'.

Assuming this was in the course of development of a speaker, we would have the 2 versions in an ABC test (one is usually repeated or perhaps another speaker with good midrange would be the 3rd).

The panel is not told anything about what they are listening to or for.

There are 3 possible outcomes.
  • no comments on midrange - the difference is too small to matter
  • the change is preferred
  • the change is thought worse
We've always had sensible results from such a test.  Admittedly, this is with a very experienced and perceptive panel but I was also prone to dragging ANY visitor to the factory and get them to do a DBLT on whatever we were working on at the time.  :eek:
__________________

bluebird, the speaker is no longer made and spares are non-existent.  Some friends of mine are secretly trying to corner the market on them.  You'll excuse me if I keep quiet to avoid jacking up prices on eBay  8)
 
ricardo said:
Abbey, you mention asking your panel to 'concentrate on midrange'.

Assuming this was in the course of development of a speaker, we would have the 2 versions in an ABC test (one is usually repeated or perhaps another speaker with good midrange would be the 3rd).

The panel is not told anything about what they are listening to or for.

There are 3 possible outcomes.
  • no comments on midrange - the difference is too small to matter
  • the change is preferred
  • the change is thought worse
The only difference is we do ABX, because it's almost impossible to find an alternative speaker that would not be instantly recognizable as different. As I wrote earlier, we may start a test with an alternate speaker, but generally one of different technology (direct radiation vs. horn-loaded, or huge BR subwoofer vs. 6th-order). But after this initial phase, that would be ABX all along. It may not make much sense to you, but so far it's worked well for us.
 
abbey road d enfer said:
The only difference is we do ABX, because it's almost impossible to find an alternative speaker that would not be instantly recognizable as different. As I wrote earlier, we may start a test with an alternate speaker, but generally one of different technology (direct radiation vs. horn-loaded, or huge BR subwoofer vs. 6th-order). But after this initial phase, that would be ABX all along. It may not make much sense to you, but so far it's worked well for us.
Actually it makes complete sense to me.  Our 'equivalent' test would  be to have one of the 2 presentations repeated .. which would be usual in cases like this .. eg midrange performance.
 

Latest posts

Back
Top