Oye Vey - Hit a proverbial landmine

sbj_ee · Oct 29, 2018

A couple of weeks ago, I was working with a fairly new CATV system that we acquired. I've figured out most of the tangled mess that was created over the past decade or so and I'm getting things straightened out to be more scalable, manageable, and reliable. Not my first rodeo in doing this...

I was working on a particular device, an Arris CAP1000, which had a primary and secondary CAP1000. Failover and failback are usually seamless. I'm actually about 1500 miles from the actual equipment, so I have a headend technician who has been there forever working with me. I simply had him connect an unused GigE interface for the primary and secondary CAP1000 in preparation of some video bitrate balancing between the interfaces since some are much higher than we comfortably use. For a reason that is still unexplainable, the primary CAP1000 seemed to think it's hardware vanished and failed over (well - attempted to failover). All video (300+ channels) to two markets just dropped in mid-afternoon. I noticed the failover attempt and was quickly onto the secondary CAP1000 and saw that it clearly did not take over. It was acting like it did not even know that it was to assume the active role. Looking at the primary CAP1000, it thought it failed over and the entire redundancy scheme was now in some indeterminate limbo. To restore service, I had the headend technician power off the secondary CAP1000. I did not want to fight some weird redundancy issue at this point. Then I had him power cycle the primary CAP1000. Everything came up and video restored except for two GigE interfaces. One had no production video service whatsoever and another had a dozen of so channels which were still out. From the linux shell, the CAP1000 was not even seeing the interface. At this point, the headend technician says something that caused me to pause. He indicated that he had to install some optical SFP in those two interfaces, wait for the LED to blink, then he swapped them for Copper SFPs and then they linked. This is the proverbial landmine that I speak of... I believe this is also the cause of the failover not properly occurring.

Later - in a maintenance window -
We managed to get everything linked, executed a failover, and then a failback. Failover worked fine, but the failback failed. The headend technician had to screw around with the SFPs again.

We restored everything and I disable auto failback so we at least had some coverage.

Tonight, I'm going to get this fixed correctly. We're eliminating those unsupported Copper SFPs which I'm 99.99% certain are causing the redundancy issues and getting some supported optical SFPs installed. We will ensure there's no manual intervention in order to get a SFP to be recognized and linked. Then I'll ensure failover and failback function smoothly. Hopefully, I do not encounter any other landmines that may still be lingering.

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Oye Vey - Hit a proverbial landmine

sbj_ee

Electrical

Similar threads

Part and Inventory Search

Sponsor