Stating MTBF (for 'approval')

GroupDIY Audio Forum

Help Support GroupDIY Audio Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

SSLtech

Well-known member
Joined
Jun 3, 2004
Messages
5,447
Location
Florida (Previously UK)
So... a relay driver circuit which I've whipped up at work is being put into (extremely limited) production as a 'problem solver' board.

I've been asked to provide MTBF numbers for the product for 'certification' purposes... and I have no idea how to provide meaningful numbers!

Essentially, the relay drivers are just 2N2222 transistors, with 100kΩ base pull-up resistors to +24V, common emitters and Panasonic relays in the collector leg.

Panasonic rates their relays at hundreds of thousands of cycles, but this will likely only ever see a dozen cycles in it's lifetime... the 100kΩ resistor is ¼-Watt, and ludicrously under-stressed... there are a couple of LEDs (indicating status) which are fed 24V through 10kΩ, thus equally madly under-stressed.

A handful of diodes (for back EMF) -all massively over-rated and under-stressed.

I've looked at datasheets, but I can find no indication of how to provide meaningful MTBF numbers. -All I know is that after dozens of years doing this kind of stuff, this product is so massively over-engineered and under-rated that I feel utterly certain that it will continue to work essentially forever, with no expected failures whatsoever.

Can anyone provide me with some guidance on how to provide a real-world MTBF number?
I anticipate that "the weakest link in the chain" (i.e. whichever part of the circuit has the shortest MTBF) is what I should first focus on.
 
> a real-world MTBF number?

"175,320 hours".

Assumption: "essentially forever" is clearly over-optimistic, but it will probably out-live you, or in 20 years YOU will not care (retired in genteel poverty etc). "20 years" sounds too round, but "175,320 hours" sounds like an exact computation.
 
> looked at datasheets, but I can find no indication of how to provide meaningful MTBF

Makers do not give MTBF. In my life, the Military gave tables, often used to guide commercial work too. I see there are at least a half-dozen MTBF handbooks now from various Authorities.

http://www.sre.org/pubs/Mil-Hdbk-217F.pdf is an old Mil-Spec reliability handbook, and the process will be similar for any.

You have a part-type: resistor, transistor, fuse.

You have stress (here as a fraction of rating) and an operating temperature.

You have a part quality: 12-cent or 50-buck resistor.

You have environment: benign office or rocket-ship.

Get the numbers and use the formula, you have failure rate for the one part.

"0.1 failures in 10^6 hours" is essentially 10,000,000 hours.

I *guess* if you have 10 parts, you multiply all the factors together in one huge lump *for each part then add the results. If they were all the same, you would be down to 1,000,000 hours.

What I did not look-up, but MUST be considered: solder joints. In 1960s computers, joint failures exceeded all others. It was not my problem then (I was young) and I have not looked into it since (except to be obsessive in my own soldering). {Edit: ah, sec 16.1}

Edit: factor R should be 1.1, not 1.0.... I mis-read <0.1Meg. (Or change to 91K to cheat the table.)

Ignore circled base-rate .0096-- slip of the crayon. You are not buying "established reliability" parts.
 

Attachments

  • 100k-MTBF-MilCalc.gif
    100k-MTBF-MilCalc.gif
    79.7 KB · Views: 19
Relay, SP, low rate, low stress, is 0.336 benign, 7.4 mobile (/10^6 hrs). The mobile/benign factor for resistors is 8 but 22 for relays (makes sense). I figured a resistor was 0.1 in mobile service, so the relay alone is like 74 resistors (not counting solder joints). In benign service the relay is worth 29 resistors. The relay is almost certainly your main failure part. You could maybe just figure that and then round-down to half to cover all the little parts conservatively.

One relay in mobile service looks like 135,135 Hours. Remarkably close to my "175,320 hours" wild-pitch. Indeed of that parts list, banging in a jeep, I would give 50:50 odds on relay failure within 20 years. (But my 1996 Honda is stuffed with relays and none have hard-failed yet...)

That 1990 handbook may not be a good source for LED factors.

There is a whole school of thought that the MTBF handbooks are pretty bogus, and serious manufacturers have their own data, but that's above your pay-grade.
 
Not simple... for things that take very long to degrade, you accelerate life testing with elevated temperature, repeated cycles, higher stress,  etc...

If you try to predict what will fail and engineer for that to not fail you will chase your tail, as higher circuit complexity is just more things to fail... i.e. a shorted protection diode could inadvertently cause the system failure it was there to prevent.  ::)

Back last century we had to spec some amps for the military to use in air raid towers in the middle east.. We  derated them to 1/3 their normal output power and tested them at crazy high ambient temps.  If they failed in use over there, I never heard about any.

I would be tempted to just submit the relay data as the weakest link and be done.  My guess is any failures would likely be solder related, but hard to predict. 

JR 

 
Eons ago I wrote a PC program to calculate MTBF. It back in the days before Windows. It was based on the MIL STD Handbook 217 (I think). You just input the temperature, the environment (benign, rugged or harsh), the quality level (military, industrial or commercial) and the number of each type of component and away it went. I released it as a shareware program and sold exactly one copy. The someone invented Lotus 123 and I transferred it to a spreadsheet.

The Handbook gives failure rates per million hours so you just add them up and divide the total into one million and you have the MTBF. You can find the Handbook here:

https://snebulos.mit.edu/projects/reference/MIL-STD/MIL-HDBK-217F-Notice2.pdf

I don't think I have the spreadsheet anymore but it is easy to use the Handbook and a spreadsheet to calculate MTBF for your product.

Cheers

Ian
 
> predict what will fail and engineer for that

From the summary, the relay is the big risk. He could consider a socket. That reduces MTBF by the added connections, but might radically improve Time To Repair. Soldered, it has to go on the solder-tech's queue, and he's 6 months behind. Socketed, there's still the couple days of "WTF?" and 10 minutes actual debug, 3 days to get the part delivered, 5 minutes to swap the relay-- a week, not 6 months. If there are multiple identical relays not all essential, a sharp studio engineer could swap relays around and get the more essential function back in a few moments.
 
PRR said:
> predict what will fail and engineer for that

From the summary, the relay is the big risk. He could consider a socket. That reduces MTBF by the added connections, but might radically improve Time To Repair. Soldered, it has to go on the solder-tech's queue, and he's 6 months behind. Socketed, there's still the couple days of "WTF?" and 10 minutes actual debug, 3 days to get the part delivered, 5 minutes to swap the relay-- a week, not 6 months. If there are multiple identical relays not all essential, a sharp studio engineer could swap relays around and get the more essential function back in a few moments.
My point is/was that when you engineer for failure and add redundancy or extra parts, they become additional points of failure.  IIRC the original space shuttle used three parallel computers for redundancy, and took the majority vote as the correct answer. But this seems a different application.

I went through a mental exercise last century trying to anticipate weak links in a power amp design to engineer robustness from extra added circuitry and was advised by an older/smarter engineer why that doesn't work (isn't as cost effective as just using a higher voltage or whatever part)...

What MTBF do they need? I suspect a relay only expected to switch a dozen times will deliver an adequate life number by itself (but I am speculating and ASSuming the relay is the weakest link... electro-mechanical stuff often is).

JR
 
My sincere gratitude for the insightful replies.

It certainly gets one thinking, when asked to supply a number to which one may later be held to account.

The circuit really is -at heart- a means to fire a relay under specific conditions; which should happen extremely rarely. It is more likely to be fired as a "test to make sure it still works" every few months, than ever be really needed. -I suppose a fire alarm might be a similar analogy!

Anyhow, it's reassuring to hear that my initial instinct that the electromechanical parts would be the 'most suspect' reinforced somewhat. Drilling down into paranoia, the relay contacts are housed in a plastic case. -What happens to the plastic over 20+ years? -Does it outgas? (some of my strongest memories are triggered by smelling excelite plastic tool handles, due to the outgassing!) -What would be the effect of any chemical release on the contact surface resistance?

Timing wise, this week's awful fire in London, which may have been -partially or completely- caused by the cladding material and fueled by the air gap behind it, has led to a certain 'unforeseen consequences' paranoia on my part. I don't think this is a "life sensitive" application, but i'm still concerned that I should be able to stand behind the final product with the conviction of having considered all possible failure circumstances thoroughly.

In the 40-or-so years that I've slung solder, I've still got things from my very early days which work. Most failures have been from mechanical stress or abuse. A few 'boomerangs' (stuff that has found its way back to me for work) have been due to poor soldering joints, or heat-related component failures, and of course I've visited the work of other people which has required repair; often for the same reasons. After a while there's a "sense" of what's so "well within failure tolerances as to be negligible" and where the "If I were going to break down, I'd worry about this" parts are. -I'm certain that any bench tech gets a sense of this after a certain number of lids have been removed and failure mysteries investigated.

I'll sit down for a day and plow through the datasheets and try to come up with a number. My instinct says that if I built a hundred of these, I might see one fail in ten years... (and it might be the relay, but it might more likely be because someone hooked it up wrongly!) so I may use that as a guiding thought, but really I should use whatever data I can get.

Again, thanks Paul, Ian and John.

Keef
 
JohnRoberts said:
... I suspect a relay only expected to switch a dozen times will deliver an adequate life number by itself (but I am speculating and ASSuming the relay is the weakest link... electro-mechanical stuff often is).
Two things that may or may not matter, for a relay that so rarely switches state:

1) It'll spend decades in the "infant mortality" part of that bathtub failure curve, if the curve is based on number of operations,

2) A year between switching gives plenty of time for the contacts to build up tarnish, corrosion, dust, etc..

Just saying.

Gene
 
Keith,

Fairchild has data from an IC perspective (2N2222A - https://www.fairchildsemi.com/products/discretes/bipolar-transistors/small-signal-bjts/PN2222A.html)
Your relay manufacturer should also have this.
I would chose the lowest of them all and provide that.

Thinking from a purely defensive design perspective:
- No more electrolytic caps for power supply decoupling. Use some 2.2uF Ceramics. EL's dry out over 20 years.
- Use big pads to solder to. Someone may have to replace the relays etc some day.
- Have you considered a DIN-Style Relay that can be swapped out easily. (and sourced forever more)
- Put an LED on your control signal and an additional LED in your Relay path. You'll know very quickly if the relay isn't working then.

Give me a call if you want to chew through this.

/R
 
> -What happens to the plastic over 20+ years?

MTBF does not say how long it will last. It hardly considers old-age processes.

It is an index to the number of failures in a large population over a reasonable service life. Say 100,000hr MTBF. One Year is 8,766 hours. You expect 0.09 failures per year. In a population of 1,000 products this is 90 failures per year. (Check my math!!) Is your repair depot provisioned for around a hundred repairs/year? Will the warranty costs kill your profit?

There will be more trouble early-on from infant mortality, so a 30-day warranty may not save much.

My sense of modern parts is that once the weaklings are burned out, the curve is flat for around 5 years, then rises. Particularly commodity electrolytic caps. I don't think quality semiconductor fail rate rises for 20+ years. Relay rot is an interesting question. I repeat that my 1996 Honda is doing OK on that, despite "mobile" service with wide temperature swings.

> this week's awful fire in London

The cladding makers will not be called to account. They clearly market an FR grade. It was a local decision to install non-FR grade. And I strongly suspect the local experts will not be sent to prison-- the choice echos the sprinkler code which did not require sprinklers on balconies <2m wide. (I may be mixing this fire with a stunningly similar fire Down Under.)

> concerned that I should be able to stand behind

Civil engineers, building and bridge builders, have fretted this a long time. You use time-tested materials and large safety factors on well-considered stresses.
 
SSLtech said:
My sincere gratitude for the insightful replies.

It certainly gets one thinking, when asked to supply a number to which one may later be held to account.

The circuit really is -at heart- a means to fire a relay under specific conditions; which should happen extremely rarely. It is more likely to be fired as a "test to make sure it still works" every few months, than ever be really needed. -I suppose a fire alarm might be a similar analogy!

Anyhow, it's reassuring to hear that my initial instinct that the electromechanical parts would be the 'most suspect' reinforced somewhat. Drilling down into paranoia, the relay contacts are housed in a plastic case. -What happens to the plastic over 20+ years? -Does it outgas? (some of my strongest memories are triggered by smelling excelite plastic tool handles, due to the outgassing!) -What would be the effect of any chemical release on the contact surface resistance?
there are different relay contacts specified for different tasks, some with noble metal to prevent oxidation.  Some are also sealed with inert gas (?) so should be free of plastic outgassing concerns.
Timing wise, this week's awful fire in London, which may have been -partially or completely- caused by the cladding material and fueled by the air gap behind it, has led to a certain 'unforeseen consequences' paranoia on my part. I don't think this is a "life sensitive" application, but i'm still concerned that I should be able to stand behind the final product with the conviction of having considered all possible failure circumstances thoroughly.

In the 40-or-so years that I've slung solder, I've still got things from my very early days which work. Most failures have been from mechanical stress or abuse. A few 'boomerangs' (stuff that has found its way back to me for work) have been due to poor soldering joints, or heat-related component failures, and of course I've visited the work of other people which has required repair; often for the same reasons. After a while there's a "sense" of what's so "well within failure tolerances as to be negligible" and where the "If I were going to break down, I'd worry about this" parts are. -I'm certain that any bench tech gets a sense of this after a certain number of lids have been removed and failure mysteries investigated.
Years ago at Peavey when we extended our warranty from 3 years to 5 years, I (we) did an extensive review of years of service repair records... After looking at hundreds of SKUs only one actual circuit design flaw stood out as a statistical anomaly, and it wasn't caught already because it was a relatively low selling SKU.  The old infant failure, bathtub curve was in force... products that work for months, generally work for a bunch of years. 
I'll sit down for a day and plow through the datasheets and try to come up with a number. My instinct says that if I built a hundred of these, I might see one fail in ten years... (and it might be the relay, but it might more likely be because someone hooked it up wrongly!) so I may use that as a guiding thought, but really I should use whatever data I can get.

Again, thanks Paul, Ian and John.

Keef
Depending on the task, perhaps solid state relay (switching) might provide better reliability, but what kind of life do they expect/want?

JR

PS: That said I've encountered some unresolved switching issues with using thyristors for power switches (manufacturer never could explain the rogue behavior even when I returned the actual smoking gun parts) and I ended up going back to mechanical power switches for better reliability.  The install market does not suffer service calls lightly.
 
I ended up coming across this site in a google search for MTBF, and have been perusing some of the reliability info there.  It's mostly a crusade against misusing MTBF as a metric in ways for which it isn't really of use.

http://nomtbf.com/

Reliability engineering can be pretty complex, though it's not something in which I have much particular experience.

 
Question
Is there any data on the Mean Time Between Failures (MTBF) of relays?
Answer

"There is no data for the Mean Time Between Failures (MTBF) for Relays.
This is because the mean failure rate greatly depends on the current flowing through the contacts, the load type, switching frequency, ambient temperature, and whether Relays are connected to an AC or DC load.
The failure rate P reference value and endurance curves can be used as indicators for the frequency of relay failures and service life."

here is a MTBF report for solid state relays>

http://www.crydom.com/en/tech/whitepapers/ssr_reliability_whitepaper.pdf
 
One reason for MIL hanbk 217 was because it gave a benchmark for comparison, and a possible way to improve part and system failure rate by modifying parts, circuits, or their working environment.

I was across the growth in some tools for reliability assessment of power systems - again the tool allowed a comparison of different ways to configure high reliability supplies.  It used Monte Carlo assessment, similar to Spice simulation Monte Carlo assessment of worst-case operating performance.  That all dumbs down to service life probability of a failure - although failure is only part of the performance, as in many situations it can be 'availability' that is the key metric.

Whatever assessment you do, you obviously need to detail what you are doing, given you haven't been provided a footprint/procedure to work within.  I would also try and incorporate total system aspects like part manufacturer and supply chain quarantine/auditing (who can guarantee that a part is the part you think it is nowadays and can be tracked to a manufacturers batch and change log), and manufacturing and testing procedures (that may allow the finished product to avoid infant mortality failures), and the service/maintenance practices (to provide confidence that the part was and should be able to perform in to the future, as well as spare parts and people available to repair), and alarming of in-service operation. to alert between maintenance or actual events.    Obviously the more benign the product (manufacture cost, maintenance cost, cost of a failure) the less effort needed to assess everything.  Sort of like a modern day risk/hazard assessment matrix before you dig a hole, or drill in to a wall, or perform some arc-welding, or climb up on a ladder to do some work.

 
With thanks to all, I used Mil 217 and plugged the values in, using 40°C (104°F) as an operating temperature (the actual temperature is going to be more like a comfortable room temperature, around 20°C) and the number I ended up with was about 19years 6months... surprisingly close to PRR's napkin-value of 20years.

I made a spreadsheet with tabs for each type of component (resistors, transistors, relays etc) and listed each individual component of that particular type from the BOM working downwards, with each row working left to right for the formulaic progression. -From there each tab component output value linked to a "main" tab, which totalled up the values, and made the final conversion from failures per million hours to overall MTBF.

That submission was accepted, the first production units were inspected and evaluated, and the units are now UL approved.

Keith
 

Latest posts

Back
Top