Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations GregLocock on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Unintended Acceleration: Toyota ECU code somewhat short of perfect 2

Status
Not open for further replies.
While it may be in their best interest to find as many faults as possible, the problems they did find are demonstrable and repeatable (i.e., fact). So vested interest or not, the evidence was very damning.

Dan - Owner
Footwell%20Animation%20Tiny.gif
 
Someone did test killing task X with the cruise engage.

Did the throttle go to WOT? No.

Did it return to idle? No.

What did it do? Stayed at the same opening as it was.

Is proving the above enough to win a civil suit? More then enough.

They really didn't even require a tested example. They only needed software experts to testify that bugs in software and hardware always exist and also that there was a fail safe lacking in the Toyota system. Combine the possibility of bugs existing and a fail safe that won't catch every failure and the conclusion is that it's possible the ECM can lose control of the throttle blade position. Win.

Does Toyota have issues they need to fix? Yes.

Do other manufacturers have issues they need to fix? Probably.

Did Toyota get targeted during the recession because they were taking a large chunk of market share from the bankrupt N/A manufacturers? I suspect so.

 
The fact that killing any given program once or twice without bombing anything is hardly indicative of robustness; otherwise the Challenger would have retired properly with the rest of the fleet. The Intel Pentium had an FDIV bug that involved a few error amongst thousands of working values and affected calculations beyond the normal requirements for precision:

The Toyota problem existed well before any lawsuit or any parties with vested interest got involved.

TTFN
faq731-376
7ofakss

Need help writing a question or understanding a reply? forum1529
 
Software has to be written to be robust and tolerant of faults. Aircraft RTCA approvals for hardware and software used in aviation devices has to take into account upsets that may come from radiation. Even regular gieger counter will read 25 to 100 counts per second at sea level. A single flipped bit in a processor or memory and there's no telling what may happen.

As someone who used to write software I found the transcripts of the Toyota case rather shocking. Some of the coding practices that are revealed within the court transcripts wouldn't even fly at Microsoft, and we know what kind of crap they sell.
 
I don't recall the article in detail, but I don't remember them saying killing thread 'X' would definitely cause a WOT condition... I do recall them saying it was insane to have one thread have control over so many critical tasks that having it die could lead to major issues.

I particularly love how Toyota told the courts they used ECC memory (error correction, i.e., 9th-bit), but upon inspection of actual products they were not.

And the biggest cardinal sin for me (and any other respectable embedded programmer)? Completely underestimating the amount of stack space in use at any particular moment. This is a NO-NO of the highest degree. A stack growing beyond its bounds could very easily flip the wrong bit/byte and go WOT... and there would likely NOT be a record of such an event due to its storage in RAM.

Dan - Owner
Footwell%20Animation%20Tiny.gif
 
Just finished reading the complete 177 page report of NASA's analyis of the Toyota unintended acceleration investigation.


I found some key discrepancies in what the EDN author has claimed, (at least according to the posts above) compared to the NASA analysis.

Has anyone else read the NASA report?

Does anyone have a link to detailed documentation citing the enginering analysis methodology and test findings that the EDN article is based on?

Maybe it is my skepticism brought about from my experience as an product development engineer working for another company that had many frivolous lawsuits brought against it, and were successfully defended, but I like to see all information be well supported by hard evidence and test findings. Conclusions stated without this supporting data questions credibility.

Here are just a few excerpts from the 177 page NASA report..

7.1 Findings
F-1. No TMC vehicle was identified that could naturally and repeatedly reproduce large throttle opening UA effects for evaluation by the NESC team.
F-2. Safety features are designed into the TMC ETCS-i to guard against large throttle opening UA from single and some double ETCS-i failures. Multiple independent safety features include detecting failures and initiating safe modes, such as limp home modes and fuel cut strategies.
F-3. The NESC study and testing did not identify any electrical failures in the ETCS-i that impacted the braking system as designed. a. At large throttle openings (35 degrees (absolute) or greater), if the driver pumps the brake, then the power brake assist is either partially or fully reduced due to loss of vacuum in the reservoir.
b. NHTSA demonstrated that a MY 2005 Camry with a 6 cylinder engine travelling at speeds up to 30 mph can decelerate at better than 0.25g with 112 lbf on the brake while the throttle is open up to 35 degrees (absolute), with a depleted vacuum assisted power brake system.
……………..
Fundamentally, the ETCS-i uses two sets of sensors and CPUs to control the throttle and disengage the throttle control function when the sensors or CPUs do not agree. The prime sensors (VPA1 and VTA1) and the Main CPU control the intended throttle opening. The second sensors, VPA2, VTA2, and the Sub-CPU are used to validate consistent sensor data and a properly operating Main CPU. Both CPUs must agree that the throttle motor should be engaged in order for the throttle motor to drive the throttle valve open.
While the second sensors and CPU do not directly provide a means for driving the throttle, both pedal sensors are needed to indicate off idle in order to open the throttle. Either pedal sensor, throttle sensor, or CPU can declare a fault and disable and/or disengage the throttle. These sensors and CPUs are in "series" to open the throttle.
The two sensors and two CPUs are functionally arranged in a series manner, as described above, providing for two methods for closing the throttle.
……………………..
The Main CPU and Sub-CPU must be functioning and must agree that the throttle motor can be driven. Each CPU has its own oscillator, memory error detect and correct along with a watchdog that can reset the processor. The CPUs also communicate with each other to assure that both receive consistent sensor data and are functioning properly. If either CPU fails, throttle motor drive is disabled. The system is redundant to preventing a failed Main CPU from controlling the throttle.
…………………………
Two throttle sensors need to agree that the throttle valve is positioned properly. If the throttle valve does not achieve its intended position, power to the throttle motor is shut off. When the throttle position sensors disagree, throttle control is disabled and the throttle valve is returned to a spring loaded detent position of 6.5 degrees opening which is about 3 degrees more open than typical warm idle. At this point the diverse fuel cut function controls engine speed. Multiple sensors and signal sources are used to identify if the throttle motor is having trouble driving the throttle to its intended position.
……………………………
Diverse backup controls utilizing the Electronic Fuel Injection (EFI) module limit engine speed and power through a power management function employing fuel cut and ignition timing to protect the system against the consequences of unintended throttle opening due to the failure of sensors, CPU, or a mechanically stuck open throttle valve or otherwise mechanically failed throttle valve. The diverse backup is the fuel cut function that will stop fuel flow to the engine if either VPA1 or VPA2 indicate idle and the engine speed is above 2500 rpms.
………………………
6.7.2.2 Heartbeat
The heartbeat pulse train signal from the Main CPU is provided to the power ASIC and also to the Sub-CPU. The Sub-CPU watchdog pulse train is provided to the Main CPU. The Main CPU can reset the Sub-CPU and the power ASIC can reset the Main CPU and Sub-CPU. The heartbeat pulse train is software generated and acts as an external indication of proper CPU hardware and software operation.
During any CPU reset, the CPU outputs to the H-Bridge that drive the throttle motor are pulled-low, disabling the motor drive.
………………..
6.7.2.3 Watch Dog Timer
Implemented in hardware, one watchdog timer exists in the sub CPU, and one exists in the Main CPU. Each watchdog timer is initiated at startup, and requires constant re-initiation by software. If a watchdog timer expires without being re-initiated by software, the CPU hardware is reset and restarts. The software function that re-initiates the watchdog timer executes in the lowest priority task. If this lowest priority task does not execute, it indicates abnormal processing or timing within either the software or hardware.
During watch dog timer reset, the CPU outputs to the H-Bridge that drive the throttle motor are pulled-low, disabling the motor drive.

6.7.2.6 Software Data Checks
A subset of software data is protected by implementing software data mirroring. When the data is written, a second location is written with the complement of the data. When the data is read, the second location is also read and checked. If the check fails, a default value is used.
When this software data mirroring is used, it protects data from being overwritten, such as by stack or buffer overflows.

6.7.2.7 Fuel Cut and Electronic Fuel Injection (EFI) and Ignition
When the pedal position sensors indicate the driver foot is off the pedal, a fuel cut function is used to limit maximum engine speed. An exception is when cruise control is engaged. When cruise control is engaged, this fuel cut function is disabled.
The moment the pedal is disengaged, the engine speed is sensed, and this level determines whether fuel cut is enabled. Fuel cut is enabled when this engine speed is above the fuel cut threshold. Following fuel removal from the engine, the speed decreases. When the engine speed reduces below the fuel cut recovery threshold, fuel is restored to the engine.

6.7.3 Software Study and Results
The software study applied analysis and modeling tools to the actual MY 2005 Camry source code. Models were developed of functional areas to achieve an integrated understanding of the system behavior and simulations were run on these models to explore areas of interest. These simulations were confirmed against vehicle hardware, and the models were further refined. Ultimately, the software study supported the development of specific vehicle hardware tests.

................
Major CPU and software failures are protected through Sub-CPU and Main CPU checks, watchdog, heartbeat, and voltage monitoring. Data corruption is protected through EDAC and software-implemented data mirroring. Data limits are applied to detect sensor and output failures.
 
I recall reading the full NASA report, including NASA's disclaimers.

Basically, NHTSA was outmaneuvered, and accepted a priori constraints, so NASA was tied to a chair and blindfolded, only allowed to inspect the source code and not allowed to test the binaries, stuff like that.

The exercise was a sham, and everyone knew it.


Mike Halloran
Pembroke Pines, FL, USA
 
What I took what I read was that killing task X with the cruise engaged would cause the throttle to stay open at the angle it was before task X died. So, the argument is that the driver may have just pressed resume at a slower speed or the car was climbing a hill when task X died. During both of these times, the throttle would stay open more than necessary to maintain the set cruise speed which means the car would accelerate beyond the set cruise speed. A runaway but generally not a WOT condition since most cruise controls never seem to go anywhere close to WOT.

I have yet to read anything that explained how the code caused the 100's of reported incidents where the car was idling through a parking lot and then took of when the driver pressed on the brake. Claims that "it could" but nothing beyond that.

I agree with Dan that underestimating the stack space being used was one of the big sins in the stuff I read.
 
DanEE said:
Does anyone have a link to detailed documentation citing the enginering analysis methodology and test findings that the EDN article is based on?

The EDN article is based on Barr's report. Mr Barr described it and the methodology behind it in court. I read the court transcript when it originally turned up on the net, but unfortunately it has since been pulled. His slides however, as still available if you poke around.

Barr make several references to his findings contradicting NASA's. He claimed two main reasons: NASA were misled by Toyota (eg. Toyota said they used ECC RAM, but on inspection they didn't); and NASA had very limited time and access to the code.

Mr Barr seems, based on the transcript, to be an excellent code reviewer. He's also an excellent communicator. He put together a very convincing case that the code was unsatisfactory. Not only was the quality of the code poor, their review and tracking procedures were sorely lacking. This is beneath what we would want to believe a car manufacturer adheres to.

At one point he says to the judge:
A. So ultimately my conclusion is that this Toyota electronic throttle control system is a cause of UA software malfunction in this electronic throttle module, can cause unintended acceleration.
Q. And I know we will get to it later, but ultimately you have a conclusion that it also was the cause of the wreck in this case?
A. I do.

He based that on an 18 month review of the code. He found that while many variables were mirrored, critical values weren't. There was no error detection and correction codes on those critical values. Stack overflow was possible. It had high complexity metrics (67 functions score >50 where 30 is considered the maximum releasable). There were many global variables (over 11000), at least one buffer overflow, an invalid pointer dereference (humorously reported as a "pointer D reference" in the court transcript), a race condition, it had lots of MISRA-C violations (in fact it violated about a third of their own internal cut-down version of MISRA-C), and they even had a hardware timer kicking the watchdog. He said indications are that Toyota lacks engineering discipline and rigor, their peer-review was inadequate and they have no bug-tracking system. He said the code looked fragile (ie. fixing one bug would likely create more), and resembled spaghetti code, probably due to its evolution (from assembly to C, adding an OS, adding major core functions).

They constructed a test with the 2005 Camry on a dynometer and the code running. They manually flipped a bit, causing a task to die. They then hit resume on the cruise control, which cause the throttle to open. But because this particular task was dead, the set point wasn't set and the car continued accelerating past the intended speed. They then hit the brake, the cruise control is cancelled and the car stops. No big deal (provided you noticed the speed was too high). However, if the brake is already depressed, even slightly, and this scenario occurs (bit flip, task death, cruise control resume, continuous acceleration) then the throttle stays open even if you press the brake harder - you actually have to lift off the brake and press it again to cancel the acceleration.

I wasn't present in the courtroom and it would take me days to read the whole transcript, but from what I have read it seems pretty clear to me that Mr Barr is an excellent embedded software engineer and in particular has a keen appreciation of safety critical review principles. He put together a strong case that Toyota's software methods were less than what we'd hope for a car manufacturer. Then, by virtue of some terrible cross-examination, he leads the jury to assume that the software bugs could have caused the accident. I remain unconvinced that the software had anything to do with the crashes. There's nothing to suggest that simply applying the brakes correctly would not have prevented the accidents. Ultimately some humans in cars screwed up, got injured, and the subsequent investigation has revealed some shoddy code.

I suspect, with no proof, that the code reached that state iteratively - I could well imagine it started with the best intentions and best practices. Then, as budgets dwindled, management got jumpy, and feature request/bug fixes got thrown in at the last minute, the code got messy and the review process fell away. The team probably consoled themselves by enormous system tests - in 1000's of km of testing, not one issue, so the crappy code works. It looks like rubbish, but it works.

I agree underestimating stack space usage is a fatal error, but I think there's an argument to be had here - if in all the system testing no more than, say 35% (I can't remember the actual number) of the stack is used, then provided you have good code coverage etc., you can have a high degree of confidence that the stack won't be exhausted. If you take the academic approach, as Barr does, and add up all the function calls, including calls by function pointer and other indirect methods, then you could get a value greater than 100% - even though it will never happen in practice!

I think similar arguments could be made for many of the other criticisms - sure, the end result is embarrassingly unsatisfactory in the cold light of day, but I can just imagine the engineers crying out during development, "we need to fix this, it looks like crap!", and management saying, "you'll just introduce new bugs, we'll have to re-do all the testing, we'll be six months behind and have no new features that the customer cares about!"

An epilogue to consider: suppose Joe Engineer designs a new brake system with superior wear characteristics. John Citizen then buys a car with Joe's brakes. John starts at the top of Lookout Road and rides the brakes all the way to the bottom, "to test them". At the bottom he approaches a stop sign, steps on the brake and finds them ineffective. He ploughs through the stop sign and kills a kid on a push bike. John takes Joe to court and brake expert witness Barb tells the court that all brakes are subject to brake fade. She then tells the court that in all Joe's drawings she couldn't find any reference to what would happen if someone tried to test the brake's wear characteristics by subjecting them to sustained heat from braking. The court realises that the superior wear characteristics encouraged John to try them out, possibly leading to the crash. Joe gets sued. Joe quits engineering, realising that being responsible for introducing superior technologies to the world is not enough reward for being liable for John Citizen's inability to operate a car correctly.
 
The software development world has invented a lot of named "Design Patterns" with the intent to avoid the Spaghetti-Anti-Patterns that tend to result otherwise.
From the reviews of the code, it sounds like TMC ignored (or were forced to ignore via mismanagement and feature creep) these design guidelines, then breathed a sigh of relief when it "worked."

How is this different from designing a structure by intuition instead of by code/standard?
Is there such a thing as a Software Engineering PE?

 
MacGyverS2000 said:
Are we really trying to make the analogy that Toyota is introducing superior technologies in their ECU code?

Not specifically, just that an ECU is almost universally superior to a mechanical throttle.
 
One other note... deciding on the worst-case scenario for a stack isn't simply adding up all of the possible functions and getting an "academic" worst-case scenario that will never happen in real life. A true evaluation is deciding what branches in the function tree could possibly happen, no matter how unlikely, given a properly operating system. My understanding is Toyota did not do this, choosing instead to select the most likely branches during a normal run. By doing so, they sidestepped the worst-case scenarios (which are entirely possible in real-world situations, regardless of their rarity), and therefore underestimated the possible stack space usage by a very serious margin.

It's one thing to make educated guesses on a cellphone stack, get it wrong, and add in more stack space when you get multiple reports of phones resetting during heavy usage. To make the same mistake on a safety critical system approaches the criminal... a programmer should KNOW how bad things can get and plan for it accordingly. Not knowing how deep your branch surfing will get you is simply poor code management.

Dan - Owner
Footwell%20Animation%20Tiny.gif
 
They might have even done what they thought was the worst-case, but often, the worst-case is actually a lot worse than predicted in the original analysis. We once had a system that we thought had a 27-ms processing latency, but it turned out that the actual worst-case needed to process way more data, and the true worst-case latency was actually about 6x what we thought and tested to.

TTFN
faq731-376
7ofakss

Need help writing a question or understanding a reply? forum1529
 
Just a bad assumption on how big an object's image was going to be. Someone naively assumed that the processing latency was going to be roughly constant, but when the object grew by a factor of 3, its processing time grew by a factor of 9; oops...

Wasn't the only fubar, but it was a major one.

TTFN
faq731-376
7ofakss

Need help writing a question or understanding a reply? forum1529
 
LiteYear said:
Not specifically, just that an ECU is almost universally superior to a mechanical throttle.

Strictly speaking, the ECU only replaces the mechanical cable between its attachment points at the pedal and the throttle shaft's arm or sector. You still need mechanical devices to inform the ECU what's going on (at both ends, now) and ultimately accomplish the intended vehicle control. It has the potential to do some things better and do some things that a cable can't even do at all, but even that falls somewhat short of "almost universally superior".


Norm
 
heh - in theory, electronic throttle control should be able to improve on a throttle cable.
In practice, some of the ones I've tried sucked greatly.
They tried to second guess me - the damned things didn't follow directions.
I ease into the gas, it ignores me. I ease some more, it still ignored me. I guess it thought I was doing it accidentally?
Finally I have to mash on the bastard, 'cause the event that I previously had plenty of time to avoid is now bearing down on me!
I've seen this on a Mazda ('08 or so?)several years ago, present but not as bad on a '13 Impala.
Our '04 Beetle isn't too bad.


Jay Maechtlen
 
Jay - dead bands at the beginnings of pedal position sensor travel and ECUs that never precisely knew (or somehow managed to forget) where pedal travel begins can produce a situation analogous to a throttle cable with excessive slack at the pedal end that needs to be taken up before anything happens mechanically. The procedure for resetting the electronic throttle for the earlier years of the S197 Mustang was commonly posted on enthusiast fora back around 2008, and that fixed many such cases. But some cars were far worse than average, and descriptions of physical modifications to the innards of the PPS also showed up.


Norm
 
Norm - that's encouraging - that Mazda was barely ok as a short-term rental, I would never have bought a car that behaved like that. The Impala was a rental also, and did fine on our road trip. (7000 miles in 21 days). I think I did notice that behavior a bit, but not very much.
The Beetle is ours, and I notice the behavior a bit - but it doesn't manifest under my wife's driving pattern.
Maybe my driving pattern confuses the computer. (?!?)
cheers
Jay

Jay Maechtlen
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor