Serial EEPROM Memory, um.... Forgets

lownoise · Sep 17, 2003

We have been using 93C46 serial EEPROMs (usually from Microchip, but also TI/Fairchild, Atmel, others) in our products for a number of years to maintain instrument-specific data such as serial numbers.

In the last few years we are experiencing an increasing trickle of field failures of these parts. When they return to us, their contents have been corrupted, usually only in a few cells. The most common occurrence is $FFFF at the top address; also commonly we see the bottom address corrupted, or random cells in the middle. The most common value is $FFFF, but we have seen others.

The failures appear to be quite rare, but happen often enough in the field that they are a concern. However, we have not been able to recreate them in captivity, so we're not sure how to fix them. The failures span several products and situations with very little in common in terms of surrounding logic. We're following pretty solid design practices in terms of making sure we're not attempting to talk to the parts during power cycles, keeping supplies clean, avoiding glitches, etc.

Elsewhere on this forum I noticed some references to some unreliability from Microchip parts. Is there any substance to this claim?

Any idea what might be getting us in trouble?

Thanks
lownoise

blcpro · Sep 17, 2003

Perhaps the cells are exceeding their expected life of erase/write cycles? Most manufacturers give an estimated lifetime expectancy in tearms of erase/write cycles for the serial eeproms... If a cell is erased or written over several times a day, it is conceivable that the limit will be reached within a few years... which could lead to a failure.

IRstuff · Sep 17, 2003

Usually, EEPROMS can tolerate a minimum of 10^4 cycles and most of the one built in the last 5 yrs should have 10^5 cycle tolerance. Over a period of 5 yrs, that would still be an average of 6 to 60 cycles per day.

Although, I don't doubt that you've done due diligence with the design, you've left unanswered two issues:

1. Are the corrupted cells permanently damaged? It's possible to do margin testing on memory devices to determine if there is any degradation of the threshold characteristics of the memory.

2. The specificity of the locations are interesting. Are there specific reasons why those cells might be cycled more than the rest?

If there is no obvious issue with cell margins, then the pattern failure would suggest that there is some operational/initialization problem heretofore unknown that causes a hiccup in the program.

TTFN

melone · Sep 17, 2003

Are all unused inputs tied high / low, or are they left floating. I encountered a very similar problem with E^2 corrutpion, and it was easily fixed by tieing the unused (even though they were NC's on the datasheet) to ground via a 10K. However, it is always a good idea to run that by the device manufacturer. No connect device pins might still be a signal used elsewhere on the die.

lownoise · Sep 17, 2003

Thanks for your thoughts...

The problem is not one of maximum cycles; the data is written very rarely - nominally, only when the product is produced, and perhaps another time or two if the product is ever serviced.

The data is read more often; perhaps a few times a day, maybe as many as a few hundred.

We have not been able to find any evidence of a problem in the program. Being mostly a hardware guy, I suspected this at first too, but the code is pretty simple and the situation straightforward. We can't find a weakness there, though I suppose since we can't find a weakness anywhere, it's as likely there as anywhere else...

lownoise · Sep 17, 2003

As far as pullups/downs, the pins are generally driven by discrete logic which has no floating state.

(In a few cases they are connected to configurable logic devices which take some time to 'wake up' and drive their pins. We have already identified the need to put pull resistors on these guys, but it doesn't account for all the failures).

melone · Sep 17, 2003

Did you check solder joints?

IRstuff · Sep 17, 2003

The fact that the most common address is the top address is very suspicious.

You're left with either some sort of problem in the chip itself or some problem with the system that it's installed in.

TTFN

designchain · Sep 21, 2003

Calculate the failure rate (MTTF) based on the total field population and the duration of time they've been there. Talk to the suppliers about what the resulting failure rate is. You say it's "rare"...hopefully you can quantify that.

Check the power and signal quality, too, especially in situ if possible. Noise and (especially) negative-going spikes in power and signals can affect the charge on the floating gate. Just using "pretty solid design practices" doesn't say that you've actually checked the signal quality. Also, does power quality vary between installations? Are you decoupling the parts properly? Have your suppliers FAEs review your design.

If the failures occur across suppliers it points to a design (or SW) problem. If it's mostly with Microchip, but your volume is mostly Microchip, then you should determine failure rates vs. supplier, too.

--
Mike Kirschner
Design Chain Associates, LLC

http://www.designchainassociates.com

electricuwe · Sep 23, 2003

I have designed boards with eeproms about 10 years ago myself. As far as I remember the manufacturers specified a data retention time of 10 years at that time. Maybe that time has just ellapsed.

IRstuff · Sep 23, 2003

Actually, 10 years was simply the most rational number to select. Oxide-isolated floating gate projected retention was on the order of 100 yrs, but no one would be silly enough to actually guarantee that.

TTFN

dlandry · Sep 24, 2003

93C46 from MicroChip take a wild to initialise at power up. It is not in the datasheet but you must wait a minimum of 10msec before addressing the 93C46 from MicroChip. It is usually not a big problem to wait 30msec or more at start-up to be sure that all components are all initialised correctly. From my experience, 93C46 from Atmel initialise a lot faster at power up.

fjrg76 · Sep 25, 2003

Hi

Some years ago I had the same problem, and the solution I found was the one from some Microchip's app note that avoid the memory to be corrupted. Basically it is a 10uF capacitor between memory's Vcc and GND, but the Vcc from the system comes to the memory trough a diode, which doesn't allow to tranfer the capacitor's current to the circuit when the energy turns off. Under this schema my memories seem to work apropiatly. However change the memorie's brand. Now I'm using them exclusively from Atmel

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Serial EEPROM Memory, um.... Forgets

lownoise

Electrical

blcpro

Electrical

IRstuff

Aerospace

melone

Electrical

lownoise

Electrical

lownoise

Electrical

melone

Electrical

IRstuff

Aerospace

designchain

Electrical

electricuwe

Electrical

IRstuff

Aerospace

dlandry

Electrical

fjrg76

Electrical

Similar threads

Part and Inventory Search

Sponsor