Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Testing for failure mode - in ~ 1M H MTBF assembly

Status
Not open for further replies.

WindAddict2

Electrical
May 11, 2017
2
0
0
US
This is a question for my own education, I am the account manager, but have an EE, and no formal Quality Training ( well I have done some over my career)

I have a client what we sell a mature electronic product to ( 3rd Generation, and 3rd V of that) - we produce >300K per year, they buy ~7K per year. We have published MTBF per SN29500 of ~3000FIT ( 40C) and observed FIT of ~200 looking at the population at large, including previous Versions that included elements that have been improved today. The customer population at large is 30K we estimate they collectively see about 100 failures per year - i.e. so as I calculate it their rate is in line without broader experience - I suspect their product is in service nearly 100% but running on average 10-30%. The ambient temp is affected by the loading of the system, not just is it on or off. Also - based on warranty analysis less than 10% of failures are product failures - this gets used in a way that any disturbance (lighting, controls mis-opp) or poor application (there are external design factors that affect our component), as well as overall system maintenance ( cooling in particular) results in higher failures. i.e. well engineered and applied systems are 100x or more reliable than

Our component is a sub-assembly with est 100 electronic sub components ( Res, Caps, ICs, Magnetics - small signal transformers - etc) --

They have hired a new Reliability Engineer that is insisting that we should be able to "test" our reliability for the assembly, and provide info on the failure modes. I contend that we can not construct such a test since the MTBF is too high, that the Calculation via SN29500 and our field data is the best info. Furthermore, since this is a general purpose item, failures specific to their application need to be analyzed by looking at their field failures.

Also - the devices are one of the main sources if heat; so raising the ambient temp is not a good acceleration factor because it may cause some materials and elements to age in a way that is not observed in the reality.

Our quality team is very good and very serious, and I do not want to bring them a customer and "problem" that is obviously not valid. Also - I do not believe the customer's RE has read our publications, AP notes and other info relating to this matter, he seems to keep falling back to what I will call "textbook" methodology and wants us to spoonfeed him answers.

Am I way off base in my analysis - is it viable to construct an accelerated test that can yield meaningful info with a sample size of about 1/1000 of the MBTF figures. The assembly cost >$500 - so a 1000 pc test is not realistic.
 
Replies continue below

Recommended for you

"I contend that we can not construct such a test since the MTBF is too high, that the Calculation via SN29500 and our field data is the best info. Furthermore, since this is a general purpose item, failures specific to their application need to be analyzed by looking at their field failures."

The first part is not strictly true in the sense that most failures that go into the FIT are either temperature, voltage, or vibration accelerated, so short-term tests can be created. There's an entire industry devoted to "Highly Accelerated Stress Test (HAST)" and "Highly Accelerated Life Test (HALT)" But, this can be an very expensive proposition, depending on the degree of rigor. The typical acceleration is by temperature, using the Arrhenius equation and an "activation energy". The problem is that each component in your assembly of a different technology will potentially have differing activation energies.

As a minimum, expect to burn a minimum of, say, 150 to 300 units (50 to 100 units for each of 3 different temperature) to just get a crude idea of an aggregate activation energy for the product. If they are willing to pay for that, then all is well. If not, then they should expect to see roughly a 10% increase in price for 3 years to amortize the cost of the parts and the test and analysis labor. It might be cheaper to contract hire a reliability "graybeard" to explain to the young whippersnapper why he's full of hot air.

Given your existing data, you can apply Chi Square statistics to determine a lower bound and confidence level on the aggregate failure rate; a decent reliability engineer can probably help you with that. With 30k units in use, your customer's reliability engineer should be able to do the same; why he's not doing that may be due to laziness or inexperience. Your hired gun can likewise set him straight on that.

TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! faq731-376 forum1529 Entire Forum list
 
Thanks ! These are on a heatsink, so over 1 ft square - even 3 x 50 pc in test would not be commercially feasible, granted "can not construct such a test" is not the best wording - haha. I am confident in our internal "greybeards" - they are routinely publishing and presenting work on reliability in this field. I am mostly concerned with connecting this RE with our people, whom have little tolerance for people that will not do their homework, read our data or even do a good analysis of their own info. Then come to us insisting on Reliability 101 type analysis, basically not a discussion - just a demand. ( Of course I as the "sales guy" could not possibly have anything to add...).

There are a few things that this customer does that are not typical, but within the DS limits of the assembly. This causes some internal components to run hotter than others - and their application have different ambient temps. So, as you mention due to the materials used, ambient 50C (spot temps) the lifetime is impacted by components A B and C - but at 70C it may be D and E. Also - we also KNOW that material aging is affected by operational conditions in addition to temp. For example - internal components are subjected to 800V DC + HF ripple, this combined with the temp is very different material challenge than temp alone ( so we "can't" test with every factor at max values, and temp alone does not ell the whole story), Enamel on magnetic winding wire is a good example - but also metalization on diodes can see shortened lifetimes. Actually switching diodes can see increased V stress at LOW temps(they switch faster).... it is just a very complex scenario, and I can not see "a test" effectively helping us pinpoint failure modes.

Also - they refurbish their systems, and have access to some population of field aged parts. We have been asking for years to get back even 10 or 20 of these. They always agree great idea, but at the end of the day nothing happens. But this goes beyond explaining - we WANT the discussion, the key to improving our product as we see it is realy getting field data.

Still, thanks again
 
Yes, their field data actually contains, assuming adequate numbers, the most likely failure modes in actual life. Ditto, your own field data, to the extent that you have them.

Nevertheless, you could potentially do a FMEA analysis using your datasheet, Bellcore, etc. failure rates. The typical process is to rank order the failure rates by component and then determine what might happen if that component fails. It's not necessarily meaningful, in the end, since the failure rates that you could get are aggregated and don't necessarily tell you, for example, that the CS pin on part XYZ has the highest proportion of failures on that part.

TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! faq731-376 forum1529 Entire Forum list
 
Status
Not open for further replies.
Back
Top