Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations waross on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

hard disk failures 2

Status
Not open for further replies.

chiplonkar

Electrical
Apr 18, 2003
9
IN
We have a factory environment, with a lot of electronically controlled machinery.In past two years, there have been eight occassions of Hard disk failures ( typically 4 Gb )in PLC machines. In all cases , the failures can be corelated to Power Trips.

Question : Can this be traced down to poor earth resistance ? What should be earth resistance value in production environment involving a total load of 2000 kW on two transformers ?Is earth resistance of 10 ohms considered too high for such conditions ?

We would like to hear from anyone , on similar experiences.

 
Replies continue below

Recommended for you

It probably isn't the earth resitance! Did you ever look at the incoming power, especially when some of your big machines turn on or off?
 
What are the capabilities of the power supplies supplying the drives?

Perhaps some line conditioning is in order.

Need to investigate whether the damage occurs because of the trip itself or because of the restart.

TTFN
 
What about:

> vibration - any possibility that the power trips cause some sort of mechanical shock to the drives? Is the environment generally vibrating?

> what is the exact failure mode of the drive? Is it complete non-functionality or damaged sectors?

> Could the PLC's have odd behavior during power trips?

> 4 GB drives are probably 4 yrs old or so? Even assuming only 8 hours of operation per day, they're pushing 12,000 hr on-time.

TTFN
 
chiplonkar, you have had "eight occassions of Hard disk failures ( typically 4 Gb )in PLC machines. In all cases , the failures can be corelated to Power Trips". What exactly causes a 'Power Trip', a machine fault/short circuit? How many 'Power Trips' have you had without hard disk failures?
 

Without power-quality instrumentation {portable or fixed} to specifically associate power quality to equipment-performance events, it’s hard to judge. More so than grounding-electrode resistance, potential differences from inadequate bonding of circuits and enclosures is more likely the culprit.

Don’t forget that with high-speed signals though metallic paths, resistive and reactive characteristics of many individual instances of bonding must be understood. Indeed that is rarely a trivial undertaking.
 
Another thought:

While it might not matter about grounding resistance if everything is designed and implemented correctly, any ground loop could cause problems during massive transients. Someone should review how the PLC's are connected to tbe machinery and whether there are any sneak paths.

TTFN
 
I agree with all above posts.

Is there any transient voltage surge suppression (TVSS) at the source panel? If not that would probably be my first recommendation.
 
Melone,IRstuff,DanDel,Busbar,Peebee,

I thank all of you for your suggestions.To add to the inputs,the following may be noted.

All the hard disks which failed have UPS connected to their PCs.

Since these are PLC panels designed by wellknown and established manufacturers, all norms of circuit protection, protective devices, are in place.

There are panel cooling Air Conditioners, which are mounted on the panel doors. The vibrations are not likely to reach the hard disks.

General environment is not vibrating ,as suggested by IRstuff.
The type of failure is Bad media. Track 0 bad etc.Always after power trip.(even when UPS does keep the PCs on ).

Number of hours the hard disks are on , is 24 hours a day, 7 days a week, 300 days an year.The plant iteself is 3 years old.All hard disk are therefore,less than 3 years old.

If surges are damaging the Hard disks, why SMPS are not getting damaged any time ? Nothing happens to PLC cards.Nothing goes wrong with UPS. As suggested by peebee, there are no external surge suppressors at source, however, Power Factor capacitors are at centralised location.

We have started watering the earth pits , as a first step which can be immediately undertaken, at no cost.

Thanks everybody.

 
Have you determined whether the disks are permanently damaged, i.e., can they be recovered by re-doing low-level format?

The two possibilities are some sort of head crash, although that shouldn't happen with the 4 GB generation, or some transient problem with the PLC's that cause inadvertent write operations during power glitches.

Older generations of disks required some careful head parking to prevent damage to the disk during uncontrolled powerdown events. If you have access to the disk BIOS or other control of the head movement, you might try to see if you can get the computers to park the heads on the farthest track after each operation.

Likewise, parking the heads will also prevent inadvertent write operations during power trips.

TTFN
 
Melone,IRstuff,DanDel,Busbar,Peebee


To add further, another observation about the hard disk is as follows. All these hard discs are mounted by the control panel manufacturers, in vertical fashion.We do not know if the HD manufacturer have any norm for not mounting vertically.

Secondly, you have suggested to park the HD after each operation on the hard disk.Can you explain this in more details ? The PCs are used as HMI and are in conversation with PLCs.The software running on the PCs is also doing data trend analysis etc.

Is it possible to park hard disk in situations like above?


 
Heads can be parked with HD's in any orientation.

"Parking" is the process of moving the head onto an unused track. Normally that would be 1+the highest track number. Since the head is positioned by a stepper motor, the position will be held regardless of orientation.

Obviously, it's not clear that this will solve your problem, but given the failure mode, it might be a plausible patch.

As indicated earlier, some additional failure analysis needs to occur to determine the degree of damage incurred and to possibly trace back to root cause.

TTFN
 
Watering your earth pits is probably a waste of time as a solution for you. The effectiveness of your building's connection to earth ground would probably have little effect on the problem you are experiencing. While you may or may not have an earth-bonding issue that needs to be corrected, that would cause other problems, and would not likely cause hard-drive failures. As an illustration of this: you should be able to install your drives on a plane or in a car with no connection to earth ground and have them function properly.

Don't count on your UPS to do much in the way of filtering power. This is a common misperception. A spike at the UPS input will transmit to the UPS output. UPS's generally make for poor power filters, and the output voltage waveform is usually worse than the input waveform.

Again, I'd recommend the installation of TVSS. Even if it's just a cheap power strip with TVSS installed at each of the hard drive plugs. It sounds like you're getting damaging spikes. The easiest way to get rid of those spikes is surge suppression.
 
chiplonkar,
heres a suggestion. Your hard drives are in pcs running a scada system over the plcs. You are recording trend data and diplaying plant plant parameter to operations and engineering departments.
I would suggest you have a very good resolution on your trend data and can read down to say less than 30 secs per sample. I would also suggest that your data tags are resident as disk drive not memory to give you reliabilty and the 4 gig drives were selected to give extra space for recording data.

Now all this is supposition but I have seen it again and again and again. I will suggest if you watch the hard drive indicator it doesnt take a break. Calculate how many read writes you're doing.

I suggest that the drives are being over worked and when the pcs try to reboot they cant find sector zero to initialise your operating system

As a suggest lower your sample rates make more of your tags in the memory drive and backup data on batch \basis.

Hope this helps a bit,
I'm open to any other ideas or suggestions

Regards
Don
 
ps
peebee has a valid piont about a lot of ups units. The surge suppressor would be cheap insurance even for other potential troubles

Don
 
I have been a facility engineer with similar type applications. We did not neccesarily experience hard drive failures (but I would not rule them out), but we did experience a lot of data corruption. The problem that we had was grounding. The isolated ground from the UPS system was connected to the equipment ground and any noise (which in an industrial environment there is usually plenty of) on the equipment ground caused noise in a common mode situation on the UPS system. I hesitate to disagree with the other two posts denigrating UPS systems as filters, but my experience is that a good quality UPS system that is properly grounded and is one that runs exclusively in the inverter mode isolates line noise from the load. My suggestion would be to ensure that the grounding for the UPS load be connected to only the UPS (grounded per NEC and isolated from equipment grounds) and that there is no common mode connections or ground loops to allow common mode noise to propagate into the UPS power system.
 
Just beware that the term "isolated ground" is somewhat misleading, a better term would be "single-point ground". The UPS ground cannot be truely "isolated" from the equipment ground (without putting you in violation of NEC & setting yourself up for serious life-safety, equipment damage, and operational issues), although a single-point connection between the two grounding systems would be highly recommended.

Regarding a UPS system isolating noise: it won't. It will attenuate input-side noise to some degree, but it won't isolate it. A UPS is essentially an AC->DC->AC converter. If you get a spike on the input, it will show up on the DC bus as well as the AC output, and can possibly damage the UPS and downstream equipment. Each conversion stage will attenuate the noise to some degree but won't eliminate it. Compared to the cost of the computer equipment and the facility cost, the price of a TVSS is negligable.
 
Howdy, I have a school in Michagan that has had multiple failures on Hard Drives, motherboards, and switches. I have Cisco switches that have bad ports and power supplies that die. This year starting in September I have replaced about 35 harddrives and 10 motherboards. Last year we replaced about 10 drives and 10 motherboards. These PCs were bought about 2 years ago (IBM Netvistas). We also had problems with our phone system going down once a week. I placed the phone system on a UPS and we have only had to reboot it once for the last 3 months. Any suggestions, I am very frustrated with this whole mess. Thank You, Mark Fink
 
10 ohms is too high to be of much use with the power levels you indicate. My guess is that you are limited by geology ie rock. Watering the pits would not do much. As for your problem, my guess is a relationship with ground surges. The hard drives have their own chassis in close proximity to signal paths referenced to system ground. Although they may be connected at some point in your system or out of it, there is probably quite a difference in the conductivity of the paths. A power trip will likely be accompanied by a large current surge between earth ground and electrical ground. This can be caused by the slight timing differences as three phases disconnect leaving a momentary but massive imbalance or can result simply by charged inductances going open circuit. Since the fault probably starts in one phase, the timing may actually last for some time. I would suggest isolating the PC earth grounds before the UPS and reference to the earth ground of the plc equipment then bond the PC chassis to the same ground. The power cord ground of the PC is not adequate to handle high current surges and may actually be part of the problem. Another possible fix is to isolate the disk drive chassis from any metal. Disk drives are not as safety regulated as SMPS in design for voltage isolation.
 
chiplonkar: You have not clearly stated whether or not the drives have been "damaged" or not. Drives are very very dependable these days. Far, far more dependable then the failure rate you are experiencing. They can be mounted in any position generally except upside down, this will kill them eventually, due to bearing, head support, and cooling stresses.

The PC power supplies are fully isolated from the mains this reduces damage prospects to a drive dramatically. Other connections via serial connections might couple in noise that could damage a motherboard but NOT the drives who live an isolated life. All of a drive signals are generated and terminated on the mother board with the exception of the isolated power. If you were having voltage spikes that actually damaged the drives you would have mother boards blowing like cheap fuses!

I have built and used and trouble shot literally hundreds of pcs used in industrial control, superviory command and data acquisition (SCADA) and trending applications. Without exception historical and trending SOFTWARE was always the problem in drive corruption situations. If the data is not cached and then occasionally written to the drive but instead written constantly to the drive disaster is almost certain. All that is required is a missed write, a missed interrupt, a missed DMA process, a loss of a bit, a stray cosmic ray, an unrelated running software bug... the list goes on and on, and you will get lost indexes, blown FAT tables and damaged boot records. This is because the data writing process is a little hazardous. Each chunk of data being written to the drive requires modifications to drive tables, indexes and the like. If the drive is updating one of these and there is any hiccup, pointers to where things are can be scrambled. Once scrambled computers are masters at rapidly compounding the disaster.

Proper data logging requires sending data to the drives as infrequenty as possible. If the data is of critical nature,(really seriously critical), then it must be written at different times to different drives preferably on different computers. If it is written occassionally the external opportunities for corruption are dramatically reduced, sometimes orders of magnitude.

What I'm saying is the problem you're having is software NOT hardware. If you kicked the plug out of your computer every time it finished booting you would not lose your drives, even though this could be concidered a large power disturbance. I suggest that a brief system disturbance occurs that in all likelihood would not cause a detected problem but because the data logging is on-going logging data gets written to the boot sector or FAT table resulting in Track 0 bad ect.

That said, do also make sure you have MOV's (metal oxide varistors) or TVSS on all supply lines to all computers in an industrial facility. ALWAYS! They do work and they do cost virtually nothing! They should always run line to neutral, line to ground and neutral to ground.

Get your software fixed and always practice safe hex!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top