Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations waross on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ControlLogix Processor Failures

Status
Not open for further replies.

BradJ

Electrical
Aug 7, 2003
15
0
0
US
Anybody seen anything like this before?

We have a fairly good sized AB ControlLogix (1756) installation using 5 ControlLogix chassis containing L55 and L63 processors using Logix5000 v12. We have a peer-to-peer ControlNet network, Separate ControlNet for Remote I/O, as well as a common Ethernet network for communication to the HMIs. The processors are on v12.31 of firmware.

Our problem is we are having various processors failing or "crashing". A red "OK" LED is the only light on the processor. The processor may crash after a few hours(after reloading) or after a week or more. Sometimes they crash when we are using them (online, exercising I/O), sometimes they crash when the machine is down and nobody is around. We have yet to identify a cause or even a common thread. I found a similar description of symptoms in AB's Knowledge base referring to a processor "I_be_dead" error (at least they aren't without a sense of humor), but apparently that was corrected in v10 (or was it?).

To correct the situation, we remove and reseat the processors in the chassis, then reload the program (we've used up the 1-day battery life, more on order). Most of the time this works, sometime it doesn't. The processor occassionally "crashes" on reloading the program. Sometimes it works on subsequent iterations of downloading, sometimes we end up pulling and reseating all the comm cards or cycling power to the entire chassis. Sometimes we spend 2 hours getting a PLC back up and running.

Here's a short list of what we've tried
1. swapping suspect cards/processors
2. upgrading processor firmware
3. reducing network bandwidth usage
4. reducing processor "bandwidth" usage
5. increasing/decreasing HMI comm time slice
6. upgrading comm card firmware
7. reloading processor firmware (possibly corrupted?)
8. Panel/chassis grounding
9. Chassis installation clearances verified.
10. CP interior temperature vs. CLx specs

Anybody have any experience with this? Any new ideas? I'm considering building a 2-axis robot for each chassis so we can remove and reseat the cards remotely....

-Brad
 
Replies continue below

Recommended for you

Through some experience at our facility. It is my opinion
v12. has some issues. try going down to version 11.??
especially if you have any motion control applications.

 
Did you manage to check what were the MAJOR faults and the MINOR faults in the "Processors Fault" window - before you resetted the processor.

There may be some programming isssue as well - some large numbers or something divided by Zero ....

LEt ne know

Vinod
 
All good ideas. Going to back to v11.11 is an option, but the rest of the plant is on v12, so that would be met with resistance.

As for checking the faults, the processor is D-E-A-D dead. No comm of any type (not Ethernet, ControlNet, bridging, DF1 - zippo).

Here's an update. AB sent a utility to extract error info from the processor after a crash (requires a good PLC battery and cycling power/reseating). It's called IBD.exe - yes, I_be_dead lives! Unfortunately the retrieved file is binary, and it must be sent to AB for analysis. We've sent them 4 such files so far, no word yet.

I'll post resolution when we have it.

-Brad
 
I don't want to harp on what you have tried already but in the case of grounding here are some guidlines I have discovered (mostly the hard way) over the years. I have not used the Logix5000 in particular but I have cured erratic operation and program corruptions before in A-B, Omron and GE by insuring the grounds are right.
1) Scrape the paint from the back panel and ground it FIRST with the incoming power ground (don't use a terminal block), then route your other grounds from there.
2) All communications cables and analog cables shielding needs to be grounded at one end only (source end) unless specified by the manufacturer.
3) If the HMIs are not on the same power lines (breaker panel) there could be ground potential differences. Be sure the grounds are isolated from one another or they will "float" and cause noise loops.
4) If all else fails, the backplane may be the problem. Corrosion on the plugs and/or connectors or comm bus will cause faults.
Most electronic brains (PLCs, computers, etc.) use ground as a reference. If the ground floats, so does the logic.
 
Brad..
as per Vinod said that is correct.
my experience. I add one ladder to reset major count message to zero value.
You can try it.

Good luck

fajsod
 
From the info you have provided, It points toward issues unique to your system.
1. Get your batteries replaced.
2. Get the major/minor fault codes after a failure.
3. From info you provided, I believe there can only be 3 possible issues: programming issue ( math overrun), bad power, or grounding.

a.If you have math over run issues, you can put a interrupt routine in your program to catch any math over runs which could occur and branch to a subroutine before the processor faults.
b. Power : measure incoming power with a Dranetz or other line analyzer. AB Power supplies have a lot of "ride thru" and are pretty robust, so you should be able to easily pick it up if there are power issues present. It takes a very hard spike to kill a program.
c. Grounding. In previous replies grounding was defined very well. Take note that in your facility columns, water pipes , etc normally used may be less than an an ideal ground. You may have to drive a ground rod . Make sure ground which is not fill.
d. If you have the plc in the same panel as vfd drives, motion control, SCRs, etc. MAKE SURE your motor leads are not run parrallel to any control wiring attached to the processor. I would also use a separate ground wire for motor cable shields than your control/instrument ground. If the motor control devices are manufactured before 1994, you may need to install isolation relays for control wiring.
 
ControlLogix does seem to be a little susceptable to interferance. I had a problem with analog inputs making wild excursions that did not show up in the actual 4-20mA signal. It was finally traced to a computer monitor in the same building, but not directly connected with the PLC. The noise got through all surge suppressors, filters, power supplies etc. The monitor appreared to work fine, but it was finally recognized when it was found that the problem occured as soon as the monitor was turned on. This was in a site using a 1756-L55.

I very much doubt that the program software will cause this problem. As it is happening to a number of processors at the same site, I would look for a noise source, particularly radio frequency. Any transmitters used nearby? (Including handheld tranceivers). In our case the problem was RF generated in a faulty circuit in the monitor. RF does not respect filters or suppressors!

Hope this helps...
John
 
Thanks again to all for the feedback.

We have checked grounds. We have a separate "TE" ground for all electronics which is separated from the "PE" ground on the high voltage. All cabinets with PLCs also have drives with IGBT output stages. We've done the best we can to isolate signal and power wiring.

As for checking faults, this is not possible. Again, by dead, I mean the project, programs, tags and any program fault information is gone - the processor's memory is wiped clean with the exception of the I_Be_Dead file.

We've sent numerous IBD Tombstone files to AB. The error recovered was something to the effect of "a request was received to read memory which cannot be read". Initially they thought it was our Kepware OPC Server because we were using their new Physical Addressing mode (which is an absolute rocket for throughput), but after we switched the Kepware back to Symbolic mode it continued to occur, even without any Kepservers running at all. At least one of the most recent IBD files is reporting a null pointer, which AB attributes to corruption of the project database. We have had database corruption a couple times previously creating other problems, and the L5K export/import has seemed to clear them. The big trouble with this is that the ControlNet scheduling (extensive) and Trend files (numerous) get wiped out during this process, making it painful. Scheduling time to take the entire plant down to reschedule the ControlNet isn't exactly a picnic either.

It's worth noting that we have five ControlLogix processors in five chassis and it's only the L63 processors (three of them) that crash. We have two L55's which have never crashed. One of the L55's has the largest program on the system and has the heaviest relative CPU loading (probably should have been an L6x). Also, this same building has other processes using L6x processors with Kepserver running in Physical mode with no crashing. One of the L6x processors which crashes is an OEM system for which one of AB's drive centers actually built the panel.

My personal thoughts are it's an issue with the L6x and possibly the ENBT. The problem is somewhat reproducable by starting up the HMI's with Kepware in Physical mode (really hammer's the ENBT/L63 on startup to get all the data it needs). The problem still occurs but is not repeatable with Kepware in Symbolic mode.

In summary, no resolution yet. We have been able to reduce the frequency by keeping Kepware in Symbolic mode, but our HMI performance has really suffered.

-Brad
 
Status
Not open for further replies.
Back
Top