Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations waross on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Big blackout. What happened? 40

Status
Not open for further replies.

Skogsgurra

Electrical
Mar 31, 2003
11,815
SE
When I got words about the big outage, I immediatley went to my puter to find out what my engineering friends in the US had to say about it. But no Eng-Tips page available. Of course I can understand that. No power - no Internet.

Power was restored piece by piece and I now find Eng-Tips up and running again. My question is still valid: What happened?

Glad to see you again!

Gunnar Englund, Sweden
 
Replies continue below

Recommended for you

One of the concerns mentioned by many was how did it cascade so fast -- overheard an ex-tech who worked on relays associated with these issues saying that maybe we've gotten too fast (using new digital/microproceesor based relays that work down to less than a cycle where as the old electro-mechanical relays took 3-6 cycles to operate..) maybe some truth to it?

We had a major blackout in the metro-Denver area long time ago when a major transmission line went down -- our station was the only one that survived and we trying like mad to keep our units from tripping -- our "island" covered about 10 blocks of downtown Denver(a Friday afternoon just before quitting time) what a nightmare... (as I recall the coal unit went down, the gas units saved the day)
 
Gordonl, the practice here in the west is to study all N-1 contingencies and all credible N-2 contingencies. N-2 contingencies are regarded to be credible if there is a common mode of outage. For example, two lines on the same tower. Two lines on the same right of way. Two lines sharing a breaker at the substation bus, etc. My own experience is that certain non-credible contingencies have a greater probability than some credible ones. For example, two lines can go out together simply because they terminate at the same substation, because of relay misoperation. This type of outage does happen and yet it is considered non-credible. This type of dichotomy (credible vs. non-credible) will be resolved when we move toward probabilistic planning, as there is now an effort under way to move in that direction. The credibility of contingencies will be determined by the probabilities of occurrence. Even a N-3 contingency will have to be considered if it has a sufficiently high probability.
 
Pablo02, I think the opposite. I would think some of the problem is that many relay schemes are electro-mechanical and not solid state. The electro-mechanical does not allow the same selectivity as electronic types.
With the limited settings on electro-mechanical relays, and the difficulty in delaying the trip (or locking out) of an electro-mechanical relay due to other considerations that might be available with PLC programming of an electronic relay, may have also contributed.
 
I've found by far the most common problem with protective relaying (E-M or digital) is incorrect installation or incorrect settings. However fast and accurate, numeric relays offer plenty of opportunities for configuration errors. This takes us back to guiseppe's post about commissioning (with which I agree). My experience with commissioning in the industrial world closely parallel's his. I know some utility companies have written procedures for certain types of systems, but are there any standards or recommended practices for commisioning T&D systems?
 
skogsgurra: Hydro Quebec has real time simulators for power grids.... that they happily sell... because they like making money. This stuff is not my area of expertise... but the various programs seem quite interesting and I have talked with people who have visited Hydro Quebec's lab and were quite interested. Okay... so Quebec had some issues with ice a few years back.... but their simulation products may possibly be applicable.

Here is the link:


Margaret
 
Thanks margarete695,

Yes, these simulators are fine for simulation. But my vision is a little bit different. I think of a system that runs parallel to the real system and that gets all information (including protection settings, switch positions et cetera) in real time from the real system and where "warning conditions" and "catastroph limits" are built in.

The purpose of the system is not to simulate in order to study different configurations and load situations, but rather to give a clear picture of the stability margins in the actual system and to give an early warning if the margins for safe operation gets too narrow.

"Any simulator can do that" I can hear.
Can they, really? And, if that is so, have they been implemented on a large (i.e. 50 - 100 Mpeople) scale?

I can see two important uses for such a parallel system.
The first use is obvious; give the operators a reliable tool that shows witout mercy what the outcome of different actions will be.
The second use is to be able to communicate the stability (or fragility) of a system to the non-technical individuals that decide about investments. A map of the eastern US and Canada showing how cities lose power within seconds is more convincing than any report, written by technicians - for technicians.

 
I found it interesting that the enernex website shows a very definite over-frequency event... under-frequency schemes work nicely for under-frequency, but over-freq strikes me as being a whole other ballgame.

But it strikes me that it is just another of those unfortunate incidents that will happen sometime somewhere - chance will always find a sequence of events you couldn't ever have imagined! Not that I'm saying we should give up trying, but we should recognise that "solving" an engineering problem just moves the problem set to something we are more comfortable with.

Bung
Life is non-linear...
 
Bung,

The recordings that I have seen have a very crude time scale. The one I am looking at now (from enernex.com) is from 2003-08-14 09:00:00:00 and ends at 2003-08-14 21:00:00:00 with 2 hours per division. Each division is about 20 mm wide, so we have 120 minutes on 20 mm, or 6 minutes for each mm. The whole tripping probably took less than a couple of minutes so we shall not expect to see any underfrequency recordings there. The temporal resolution is not high enough.

The overfrequency that goes up to 60.25 Hz is a natural reaction to losing a lot of load. The frequency regulator takes frequency back to 60.05 Hz in about ten minutes and then, after 60 minutes back to slightly above nominal frequency.
If you look at the frequency at ten o'clock you will see that it falls rapidly below 60.0 Hz and that is a sure sign for increasing load (it could also bee that one or more generators switched off at that instance, leaving the others to pull the waggon).
Frequency stays just on, or slightly below, 60.0 Hz most of the morning, but just after noon that day, it falls to a low 59.96 Hz and stays there until before two o'clock, when it goes up rather fast, which I think is when a large part of the grid was disconnected to protect from further damages.
The frequency goes up to just below 60.0 Hz, but falls below 60.00 again due to high load. It struggles to keep up delivery, but cannot, and just after 16:00 there is a large disconnect which makes the frequency rise, giving the overfrequency event that you have noted. It is not a fault cause, but rather a consequence of the fault.

I cannot guarantee that my interpretation is correct, but I feel that (having seen lots of similar recordings from events on a smaller scale) it is logical and physically feasible.

Comments invited

Gunnar Englund
 
as far as I remember, there was an article about
the affect of sun on the earth. Although it is
rarely, the sun might increase its magnetic field
on the world and this might cause phenomenal and
strange trips on electrical transmission lines and
power plants too....this is what i guess....

regards to everyone...
 
Hi All.

I'm not electrical so I don't follow all of the above talk.

Most discussion is about searching for the root cause.

Two things that have been learnt in recent years from official investigations into complex industrial accidents:-

1. There is usually more than one cause - so don't allow anyone to skew the talk towards one, single cause. Parties who push forward a particular explanation may have vested interests.

2. There will be contributing factors - not causes, as such, but factors that existed that assisted the development of the accident

While it is natural to look for the starting point of this incident, if I was a New Yorker I would want to know why MY power failed. The answer to that lies in the LOCAL power station or distribution station, not in Ohio. Why did New York's power fail? That must be a relatively easy question to answer, and perhap more useful.

Cheers,
John.
 
fundam --

LOL! That's the catch-all you usually hear from the relaying techs when they can't track down the root cause of a relaying event. "Damn sunspots..." I have a hard time believing this had anything to do with the outage, especially given that NERC monitors solar magnetic disturbances and no warnings were issued prior to the event.

all --

I think it's safe to say that yes, there were probably some relaying problems that contributed to this event. However, I don't think you can blame the whole thing on that alone. As I alluded to in my last post, I believe that a large root cause of this is simply the way in which power systems are operated today given the framework in which industry is given to work. It's common for there to be thousands of MW of transfers occurring on the system simultaneously -- occurring on a system that was designed to transmit local generation to local load. Over-reliance on external generation resources logically leads to problems when transmission leading to those sources trips.

I know I'm oversimplifying things, but you can't separate the operational issues from the technical problems. When you operate a power system in an operationally insecure manner, you're asking for trouble. The big question about this whole event is this: were system operators and reliability coordinators aware of the potential for a cascading outage, and were they doing everything possible to return the system to a state in which this was not a threat? My bet is that they were not fully aware of the implications of the next outage, simply due to the fact that there was a lot going on, but that they were doing everything they could to return the outaged facilities to service prior to the cascading event. Why do I think they weren't aware of the implications of the next outage in real-time? Anyone with experience with EMS/SCADA systems knows that it's a very difficult task (not to mention very expensive) to get a state estimator up and running, let alone running quickly enough to provide accurate information about stability issues in a timely manner. Yes, we can analyze these issues from a planning perspective using established models rather quickly (you can get good information, but it won't be 100% accurate), but to take real-time information and use it to analyze operations is a much more difficult task. I don't know the stats on the issue, but I'm not aware of many companies that have state estimators running that can provide this information.

Let me emphasize -- I don't believe this to be an event of negligence. I do believe that this event was a result of a number of improbable events, some operational, and some technical.

Again, just my opinions.
 
jstickley,

Yes it is a much more difficult task to run an observer (or state estimator) than to run a simplified model. I guess hundred or thousand times more complex than the model.

But, if you can learn from such a tool. And really demonstrate to the guys with the money that things have to be changed before the next event occurs. Perhaps even use the observer as a fast decision-maker that takes the correct actions quicker than the proverbial pig can wink his eye? Wouldn't that be worth some effort?

Building such a system will be expensive, yes. But probably not more expensive than having the nation more or less dead for days. And certainly not even one percent of the cost of the total generation/distribution system.

Of course, there will be practical and technical problems. One such problem is that electro-mechanical protections usually can't be read by computers. But the solution is easy; have them read and reported by humans. After all, you do not change those settings very often.

The more I think about such a system, the more I like it. What do you say?

Gunnar Englund
 
Here is a detailed timeline of the blackout:

- 2 p.m. FirstEnergy's Eastlake Unit 5, a 680-megawatt coal
generation plant in Eastlake, Ohio, trips off. A giant puff
of ash from the plant rains down on neighbors. On a hot
summer afternoon, "that wasn't a unique event in and of
itself," says Ralph DiNicola, spokesman for Akron, Ohio-
based FirstEnergy. "We had some transmission lines out of
service and the Eastlake system tripped out of service, but
we didn't have any outages related to those events."

- 3:06 p.m. FirstEnergy's Chamberlin-Harding power
transmission line, a 345-kilovolt power line in
northeastern Ohio, trips. The company hasn't reported a
cause, but the outage put extra strain on FirstEnergy's
Hanna-Juniper line, the next to go dark.

- 3:32 p.m. Extra power coursing through FirstEnergy's
Hanna-Juniper 345-kilovolt line heats the wires, causing
them to sag into a tree and trip.

- 3:41 p.m. An overload on First Energy's Star-South Canton
345-kilovolt line trips a breaker at the Star switching
station, where FirstEnergy's grid interconnects with a
neighboring grid owned by the American Electric Power Co.
AEP's Star station is also in northeastern Ohio.

- 3:46 p.m. AEP's 345-kilovolt Tidd-Canton Control
transmission line also trips where it interconnects with
FirstEnergy's grid, at AEP's connection station in Canton,
Ohio.

- 4:06 p.m. FirstEnergy's Sammis-Star 345-kilovolt line,
also in northeast Ohio, trips, then reconnects.

- 4:08 p.m. Utilities in Canada and the eastern United
States see wild power swings. "It was a hopscotch event,
not a big cascading domino effect," says Sean O'Leary,
chief executive of Genscape, a company that monitors
electric transmissions.

- 4:09 p.m. The already lowered voltage coursing to
customers of Cleveland Public Power, inside the city of
Cleveland, plummets to zero. "It was like taking a light
switch and turning it off," says Jim Majer, commissioner of
Cleveland Public Power. "It was like a heart attack. It
went straight down from 300 megawatts to zero."

- 4:10 p.m. The Campbell No. 3 coal-fired power plant near
Grand Haven, Mich., trips off.

- 4:10 p.m. A 345-kilovolt line known as Hampton-Thetford,
in Michigan, trips.

- 4:10 p.m. A 345-kilovolt line known as Oneida-Majestic,
also in Michigan, trips.

- 4:11 p.m. Orion Avon Lake Unit 9, a coal-fired power
plant in Avon Lake, Ohio, trips.

- 4:11 p.m. A transmission line running along the Lake Erie
shore to the Davis-Besse nuclear plant near Toledo, Ohio,
trips.

- 4:11 p.m. A transmission line in northwest Ohio
connecting Midway, Lemoyne and Foster substations trips.

- 4:11 p.m. The Perry Unit 1 nuclear reactor in Perry,
Ohio, shuts down automatically after losing power.

- 4:11 p.m. The FitzPatrick nuclear reactor in Oswego,
N.Y., shuts down automatically after losing power.

- 4:12 p.m. The Bruce Nuclear station in Ontario, Canada,
shuts down automatically after losing power.

- 4:12 p.m. Rochester Gas & Electric's Ginna nuclear plant
near Rochester, N.Y., shuts down automatically after losing
power.

- 4:12 p.m. Nine Mile Point nuclear reactor near Oswego,
N.Y., shuts down automatically after losing power.

- 4:15 p.m. FirstEnergy's Sammis-Star 345-kilovolt line, in
northeast Ohio, trips and reconnects a second time.

- 4:16 p.m. Oyster Creek nuclear plant in Forked River,
N.J., shuts down automatically because of power
fluctuations on the grid.

- 4:17 p.m. The Enrico Fermi Nuclear plant near Detroit
shuts down automatically after losing power.

- 4:17-4:21 p.m. Numerous power transmission lines in
Michigan trip.

- 4:25 p.m. Indian Point nuclear power plants 2 and 3 in
Buchanan, N.Y., shut down automatically after losing power.

 
Lets see, 1)loss of a generation unit, 2)lines overloaded and tripped, 3)and out-of-step conditions that lead to a snowball effect.
Very typical for a blackout.

The last one that happened in my country, was a bit different: Earth fault in a major EHV line, islanding (one big island with load/generation= aprox. 3 and unsuficient outage), and loss of synchronism.
An announcement said that a stork (the bird usually nest on the top of the EHV towers) was responsible for the blackout, but the islanding operation were based on a study with 20 years and the outages were not appropriated.

Who was responsible after all? The (poor) bird...

Best regards,
Morcon
 
Beautyful, SidiropoulusM!

And now, the question? Was the system close to overload? Was it an n-2, n-1 or just plain n situation? I think the latter.
 
skogsgurra --

I'm in favor of state estimators -- I wish every utility had one. They are great tools to provide information to system operators. However, do you have any experience with them? The information they provide takes TIME to obtain -- the state estimators I've seen take several minutes to perform complete single contingency analysis for even medium sized systems.

I can't see how what you're talking about is even technically feasible given currently available technology and software -- you're talking about a state estimator that not only gives power flows, but also a system that performs complete stability analysis, fault analysis and predictive relay tripping analysis in real-time. I would imagine that even if you could put that kind of simulator together, it would take it well over an hour to produce any kind of reasonable results, by which time the information is outdated anyway. That's not even addressing the extreme difficulty in developing such a system and keeping it up-to-date.

Don't get me wrong, what you're talking about is a fine idea. However, it's not just a matter of time and money getting such a thing implemented. I don't want the uninformed observer of this thread to think "man, why aren't utilities doing these things?" without the understanding that they're quite difficult (if not impossible) to implement, often with an uncertain level of usefulness.

The bottom line to me is this: no matter how extensively you monitor your system, the system will fail if you have enough things go wrong. Our job as utility engineers is to be sure that this risk is reasonably mitigated. The question is, what's reasonable? Can you really plan your system with enough redundancy to withstand 3, 4, or more contingencies? Can you have a plan to mitigate EVERY possible event?
 
Analyzed/report by EPRI PEAC monitoring team:
Please keep in mind that there is no conclusive explaination of this 2003 power outage yet at this point. However, here is what was reported by EPRI PEAC monitoring team:
"The waveforms indicate a phenomena that we have called “fast voltage collapse”.
This can happen during periods of heavy load, especially when there is a dominance of motor load (e.g. air conditioning and industrial load).
Recovery from the voltage sag during these conditions can be very slow. The motors draw increased current due to the continuing low voltage and the voltage around the system collapses due to this increased demand following the sag.
Generators struggle with this increased load.
Motors will eventually stall and trip. Load goes off and voltage can go very high as a result.
Within 3 minutes more than 20 generators in NE trip. This is probably caused by frequency variations from the generation/load mismatch."
 
In an earlier post above I expressed my surprise that human intervention did not come into play for two hours before the collapse, while lines and generators were tripping all over the place. There's a report now that there was a breakdown in communications between system operators.

Visit:
Moral #1: no need for high-tech systems if our low tech fails us.

Moral #2: need for automatic wide area controls to reduce reliance on the human factor.
 
jstickley,

I disagree that such a system couldn't be built. I guess that you need computing power perhaps ten times what is used for detailed weather forecasts. It will cost, yes. But it can surely be done.

I also think (I admit that I do not know) the amount of input data will be comparable, perhaps less since much data is static (rated capacity, protection settings, power line and transformer data and many more do not change dynamically).

As I said before, there are at least two uses for such a system; (1) guidance for the operators and (2) a way of showing the guys with the money what the reality looks like. There might also be a third use for such a system; to influence the operation of the grid to avoid walking to close to the edge (bad picture, I hope that you understand).

As you can see in previous postings, there were a couple of hours when the operators could have saved the situation and I think that a observer could have done something to that. The "big dip" that took place in one or two minutes happened when the system was doomed. Nothing could help then.

But, I do agree that a PC based system or even an ordinary mainframe computer can't do this. I also agree that many (un)realistic proposals often turn into unrealistic implementations, but my more than thirty years in steel and paper mill automation has taught me that some observers do a great job.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top