Managing Emergent Work that threatens the project

KarlwithaK · Aug 3, 2018

I am asking if anyone has knowledge of a method, model, or "canned" process to address discovery issues that require fixes on the fly in an effort to protect the critical path on the far larger project.
Here is my situation. I am titled an "Emergent Issue Team Manager". I work in commercial nuclear power plants. Every 2 years we shut down the reactor, remove the oldest fuel and replace it with new. This "Refueling Outage" can last anywhere from 25 to 100 days and cost from $20M to $175M. My job is that if anything is found wrong with the plant that had not been expected and planned for is, as you suspect, fix it. I do this by forcing the creation, within an hour of discovery a cross-functional team from Engineering, Operations, Maintenance and Radiation Protection to come up with a data gathering plan, analyze the data and then make recommendations to senior leadership as to how to fix or at least remediate the worst of the problem. This is ordinarily done in the first 12 to 18 hours of discovery. The actual "fix" could be anything from replacing a little relay somewhere in an hour to a 3 week long major high voltage cable pull.
The Engineers have their part fairly well defined, using a Failure Modes Analysis tool based in Kepner-Tregoe model.
It is the TIME that give me trouble! Doing and getting the people to do what is required and recording all work, data, decisions, and all the rest as "One-Offs" eats up expensive time.
Has anyone run across a "Unified Theory of Crisis Management" book, class, execution model or whatever that is specific enough to be adapted to use in nuclear utility planned outages? But flexible enough to address anything from a broken valve stem to a blown printed circuit board to a failed tube in a heat exchanger?
I would be especially interested in hearing about a work planning tool that incorporates decision points as no-duration work into PERT charting to estimate the critical path in solving these unforeseen conditions. These "uncertainties" are bad-tasting poison to conventional waterfall project management techniques as they are NOT risks. Risks are known possibilities that could have contingency plans in place to handle the issue. These are not that. These are truly unknown ahead of time and had no plans to address them prior to the start of the outage.

miningman · Aug 3, 2018

Interesting dilemma. I will assume that since this is nuclear related, safety is absolutely paramount and that cost and schedule are very secondary. If any of the project team dispute this , then conflict will be inevitable and some input from senior management is necessary. Having said that , from the OP post, this is clearly work which has been done regularly over decades at a cost of between $1 and 2 million per day. I would have thought that over the decades, a history of unplanned, unknown and unidentifiable tasks has been accumulated.

The OP strongly suggests it is not so much the time taken to complete these tasks that is critical , rather the time required to document , plan , record etc etc , with the implied need to feed this info to the senior master scheduler for regular updates of the schedule. Surely there is no reason why the known schedule cannot be modified to include tasks 1 thru 9999 , each one being "unidentified unknown" With my somewhat limited knowledge of monte carlo simulation, I would have thought that this technique, when applied to all tasks within the schedule, especially tasks that are unknown today , but which will almost certainly occur, might be a way to start.

If "The Engineers have their part fairly well defined" it would appear that it is the administrative support that might need improvement. Is the scheduling team adequately staffed?? Does top management willingly accept that each turnaround will cost upto $2 million daily, plus the cost of lost sales??? If not , there is probably inadequate leadership at the top to support the team thru daily conflicts.

I will watch this discussion develop with interest.

KarlwithaK · Aug 3, 2018

You grasped the issue very well, MiningMan! $1-$2M a day is about right, and that is only the O&M and Capital outlay. At My station, producing nearly 1500 megawatts the lost production cost is around a million on top of that outlay. Plus the cost of buying power at high rates from pealing units and sometimes competitors in the regional utility distribution group.
The latter half of your first paragraph does make me blush a bit. Yes, this plant went online in the 1980s. Yes, a great deal of what we see becoming an "Emergent Issue" has occurred at some time in the past. Yes, there are records, called "Operating Experience" or OPEX maintained at each nuclear station and by industry groups like INPO and NEI. And NO... They are not always used to best advantage to identify "unidentified unknowns" and make them "risks" that can have contingency plans incorporated.
As you may suspect, Pareto's law very much applies. My experience is that over 80% of the issues that get classified as "Emergent" and "could not have been foreseen" are actually easy to see in hindsight that the issue was an obvious possible outcome of whatever the original effort had been.

Your second paragraph is rather "telling" as well. An unspoken bias when it comes to "Emergent Issues" being incorporated into the Outage Master Schedule is that by doing so reveals to every tracking feature of the software and to Senior Management that the Master Outage Schedule had perhaps not been properly vetted for risk and contingency. This negatively impacts one of the MAJOR Performance Indicators for the Outage Management Leaders. One that may be tied to their performance bonuses?
This is often referred to as "Schedule or Scope stability". The bias is to handle the issue outside the Master Schedule, often using FIN (fix-it-now) teams from maintenance that work outside the outage scope and schedule.

As you observe in your last paragraph it is true that Emergent Issue Team often does NOT get the support of P6 schedulers. The rapidly developed repair strategy using available (FIN?) resources are usually already in progress before all the repair work order tasks are planned, and sometimes before any needed parts are even identified!
The Emergent Issue Team Leader and the EIT Manager (Me) will usually construct and ad-hoc timeline on a whiteboard in the Outage Control Center and update progress there and with oral updates to the entire Outage Control Team every 3 hours minimum, or hourly is the issue is impacting the Outage Critical Path.

Yes, Senior Management and Corporate Leadership understand and accept the high daily cost of refuel outages. It is a cost of doing business in our industry. But what they do NOT want to see or hear about is an unneeded day, shift or single hour extending the overall duration or the outage.
Predictability of the overall fleet production including nuclear refuel outages, scheduled maintenance outages, forced outages, the employment of fleet owned peaking units is what they get paid for. Ultimately providing cheap, clean, safe nuclear generated electricity to the grateful public and profits to our shareholders is the goal.

miningman · Aug 3, 2018

KK, the following comments are offered as advice to help you CYA in the event that you are someday held accountable for some disaster when effectively you haven't been given the necessary authority to influence the project. It seems to me that you and the FIN team are NOT part of the project team. You and your colleagues are likely seen as a necessary nuisance whose efforts must be tolerated by the project team. If the EIT team does not get the support of the P6 schedulers , then clearly someone ( probably with a high level of authority ) does not recognise that one day, the FIN work will very adversely affect the critical path.

Now in fairness, if this plant has been running for 30+ years with biennial refuelling done as a project, and if there have been no occurrences of major schedule slippage, then perhaps the present system is working well, and if it aint broke, dont try to fix it.!! You imply that cost control is not super critical but schedule delay is intolerable and would be "career limiting" for any individual percieved to be responsible.

My advice would be to overstaff all your activities, ensure that you continue to feed the schedulers the necessary info (maybe only at a very high level) so that if they choose not to utilise your input, you can document your efforts that were ignored by others. You might also develop your own private best schedule estimate of FIN work, perhaps just using MSProject. You would likely then be the first person to be able to warn of an impending major schedule blowout...... of course sometimes its the messenger that gets shot!!!

If top management really has minimal interest in having a realistic , all encompassing schedule, then perhaps in the interest of self preservation, you should pull back a bit. I suspect you are relatively young.... perhaps a quiet informal chat with your boss might be productive... but perhaps not , its your call.

KarlwithaK · Aug 4, 2018

MM, I do appreciate the input and you are right that some political CYA schmooze will be required to implement any changes that might drop a load in a senior manager's rice-bowl.
But no, I am not so young. I took my PMP exam back in the early 2000s when I was in my mid-40s. I have been all-nuke since 1999 at over 30 nuclear sites in the US and Canada.
I do not believe I will be successful bulldozing discovery work into the Outage Master Schedule for the reasons already mentioned above.
The problem with using MSProject, or even P6 is that they are well suited to traditional "Waterfall" management with either serial or limited concurrent activities. What they do NOT do well is present the "True" picture of the complexity of an issue that begins with troubleshooting, then data analysis, then make remediation recommendation followed by parallel paths to plan work orders, obtain parts, find an appropriate clearance (Lock-Out/Tag-Out) window to do the work in!
That is part of what I am looking for. I will have to educate senior management to understand that solving these problems "on the fly" is NOT a simple straight line to the finish. Many of these higher ups seem to only understand "stick-and-ball" timelines on a whiteboard. (Sometimes it feels like they demand such simplicity with owner and ETC (expected time / completion) shown so that any deviation can be pointed to as evidence of my poor management of the issue.) OK, that was a bit un-Christian and un-charitable on my part, but it has happened before. Especially if the accuser is the one who owned the part of the outage that CAUSED the emergent issue and he should have recognized the risk and made contingencies against it.
Alright, I got that bit of bile out of my throat.

A tool I want to use and have as an adjunct to teaching the organization about the complexities and resource impacts these emergent issues cause in the heat of battle is something like this:
I want a software app that will run on a desktop PC. In the Outage Control Center (where I do my job as EIT Manager) we have 7 large format TV/Monitors on the walls, and 4 new "smart" whiteboards. The TV/Monitors are touch-sensitive.
We are all familiar with PERT charting to discover the critical path of a project. There are several software apps out there, for free, that emulate the PERT charts. There are also apps to illustrate a decision tree. I want an interactive software app that combines the functions of both. I have attached a little sketch made up in Visio but saved as PDF that might represent a very small portion of the efforts going in to a fairly simple EI Team effort. The left side is just the different functions I may want and so I left them there to copy/past or drag to the actual diagram at the right. (PDF would not show the separate pages/layers used in Visio.)
So here is how I want this to work. Note that the duration to accomplish each activity is associated to the arrow between dependent tasks. The task boxes themselves, and the decisions and other functions have no duration, only the arrows. When this tool is populated the idea is that every decision that may change the path to a successful resolution is accounted for here with the results of that decision carrying the critical path to the final resolution of the issue. So, if the conclusion of the FMA team is that there are 3 possible paths to take to resolve an issue, a decision box there will have one input arrow that carries the estimated time it would take the FMA team to reach their conclusion and make the recommendation. The 3 arrows leading away from the decision box will have a duration estimated to the completion of the next step for each of the different recommendations. For example, one recommendation may be to do nothing, to take the widget out of service until a future scheduled outage because the widget is just one in a parallel train of 3. So the duration may be very short and the job is done.
But another recommendation may be that the widget is required, because the other 2 in the train are near the end of service life and cannot be relied on, so the arrow would have a duration of "X", the time required to plan the work order, the execute the repair, with another duration, then testing.... And so on until all the durations are totaled and the new "real" Critical Path is revealed.
Now comes the cutesy part..... Once all this map/chart/things is up on a bigscreen monitor, a user simply touches each item as it is completed, of touches each arrow as selected coming out of a decision. The software will dim all completed activities and the associated arrows and add them up. And ALSO project the path through the network of activities and decisions to determine the longest (critical) path and totals it in a box at the top of the page. This number will also carry a history showing the change plus or minus to the critical path at every decision made. Then for icing on my cake, I want a "playground" function! Maybe a second screen, like pages in Excel, where I can see the same chart from the first (primary) page but now I can touch the arrows and activities and arrows to experiment and postulate possible outcomes and the effect on critical path!

So... what do you think? Something big, colorful, intuitive to use and mostly INSTRUCTIVE to the people that need it the most.
And a continuous lesson in point to project managers and outage maintenance window managers that might not appreciate how much treasure is wasted when their plans are not effectively challenged and risks identified way ahead of execution.

No, MiningMan, I likely won't share all this with my manager, as he would be the one to bear the scars such transparency could bring.
Plus, if I can find the right software developer / engineer to work with me, this might be my retirement ticket!

miningman · Aug 4, 2018

Not sure I can contribute much more here. It appears the fundamental issue is that your management doesn't even recognise that there is a potential problem ....... as such you cant expect much support with any of your endevours. By your own admission, you expect that internal project managers and senior management are unlikely to appreciate the "benefits" of your proposals.... and in fairness to your colleagues , you probably don't have a crystal clear idea of what you are trying to achieve, and thus it will be VERY difficult to obtain buy-in. My recommendation would be to drop the idea completely and live within the present system. If a suitable software engineer drops into your lap , perhaps work with him in your spare time, but be very cautious how far above the parapet you stick your head when approaching your bosses. Some other opinions from other posters might be instructive. Good luck.

KarlwithaK · Aug 4, 2018

I really do appreciate your input, and the caution to be prudent in pressing my case is certainly valid.
As far as "fairness" to my colleagues, here's my take on the culture I want to change. I think is can sum it up pretty simply.

The current state is that many (but not all) senior managers and project managers in our organization are not familiar with Lean/Agile management methods.
Yes, I am aware that Agile methods are most usually associated with IT/Software development projects, but the reiterative "steps" to prototype, test and revise the product are analogous to the analysis methods we currently use to determine repair strategies.

My peers and managers are very familiar with conventional "Waterfall" linear/sequential techniques with minimal risks or multiple paths to a successful completion.

I do NOT mean to say that any of them aren't competent, only that they are far more comfortable with low risk, only one proceduralized method to get from start to finish.

I do admit that I may not have a crystal clear picture of everything that needs to change, but I do have enough for a cogent "Vision Statement". A Statement of Work or Scope could be developed later. The Vision? "If, during the execution of scheduled work, an issue is discovered that threatens the production (of electricity) or has the potential to negatively impact the Critical Path then a robust, methodical procedure and management "toolkit" shall be employed to accurately represent the plan, the progress and the risks to resolve the issue."

The governing document (in my industry a Fleet-Level Organizational Procedure) would determine the membership, structure and reporting of EI Teams. We have 2 currently, but I believe they could be much better.

What I want is a fleet level program with a corporate sponsor to own the program and the procedure with a dedicated EIT Czar. That leader would get the procedure created/corrected and the proceed to establishing some fleet-level support that will be available to individual sites for training and complex problem-solving for Emergent Issues of high potential risk or high potential impact.
The EIT Czar will then, with local assistance, select and train EIT Managers at each station and establish one of them as the station "Lead EIT Manager".
That LEITM will then select and train local EI Team Leaders for use in outages and online issues.

The EIT Czar will develop the "toolkit" with all the forms and document templates optimized for use on the computer. The toolkit will include the planning and tracking software tool I described earlier.

Completing the training as an EIT Manager or EIT Leader should carry a qualification that is recognized at all stations within the fleet. That will enable any EIT Manager or Leader with specialized knowledge or experience to be loaned and employed to best advantage at any station in the fleet.

OK..... I maybe do have more that just a "Vision" of what I believe should be changed. I freely admit that I am probably positing for change at a level far above my pay-grade.
But the accurate presentation of the plan, the progress, the risks and the opportunities that present when addressing resolution of Emergent Issues is the RIGHT thing to do. Senior Management requires and deserves to know all the pertinent information when called on to make a repair/replace/do nothing or more complex decision.

Y'know... If I can get the right guy to help write the software, and I write the procedure template with a suite of form-fillable documents to go with it (the toolkit) then publish under a pseudonym it might be easier to get acceptance. And I keep my head down, snickering all the way to the bank.

I do hope some or many others weigh in on this thread. I'd say we have given them some stuff to shoot at!

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Managing Emergent Work that threatens the project

KarlwithaK

Nuclear

Recommended for you

miningman

Mining

KarlwithaK

Nuclear

miningman

Mining

KarlwithaK

Nuclear

miningman

Mining

KarlwithaK

Nuclear

Similar threads

Part and Inventory Search

Sponsor