from the sometimes-it's-too-late-to-issue-an-update dept
Two recent crashes involving Boeing 737 Max jets are still being investigated. But there is a growing view that anti-stall software used on the plane may have caused a “repetitive uncommanded nose-down“, as a preliminary report into the crash of the Ethiopian Airlines plane puts it. Gregory Travis has been a pilot for 30 years, and a software developer for more than 40 years. Drawing on that double expertise, he has written an illuminating article for the IEEE Spectrum site, entitled “How the Boeing 737 Max Disaster Looks to a Software Developer” (free account required). It provides an extremely clear explanation of the particular challenges of designing the Boeing 737 Max, and what they tell us about modern software development.
Airline companies want jets to be as cost-effective as possible. That means using engines that are as efficient as possible in converting fuel into thrust, which turns out to mean engines that are as big as possible. But that was a problem for the hugely-popular Boeing 737 series of planes. There wasn’t enough room under the wing simply to replace the existing jet engines with bigger, more fuel-efficient versions. Here’s how Boeing resolved that issue — and encountered a new challenge:
The solution was to extend the engine up and well in front of the wing. However, doing so also meant that the centerline of the engine’s thrust changed. Now, when the pilots applied power to the engine, the aircraft would have a significant propensity to “pitch up,” or raise its nose.
The solution to that problem was the “Maneuvering Characteristics Augmentation System,” or MCAS. Its job was simply to stop the human pilots from putting the plane in a situation where the nose might go up too far, causing the plane to stall — and crash. According to Travis, even though the Boeing 737 Max has two flight management computers, only one is active at a time. It bases its decisions purely on the sensors that are found on one side of the plane. Since it does not cross-check with sensors on the other side of the plane, it has no way of knowing if a sensor is producing wildly inaccurate information. It assumes that the data is correct, and responds accordingly:
In a pinch, a human pilot could just look out the windshield to confirm visually and directly that, no, the aircraft is not pitched up dangerously. That’s the ultimate check and should go directly to the pilot’s ultimate sovereignty. Unfortunately, the current implementation of MCAS denies that sovereignty. It denies the pilots the ability to respond to what’s before their own eyes.
Like someone with narcissistic personality disorder, MCAS gaslights the pilots. And it turns out badly for everyone. “Raise the nose, HAL.” “I’m sorry, Dave, I’m afraid I can’t do that.”
The coders who wrote the MCAS software for the 737 Max don’t seem to have worried about the risks of using sensors from just one side in the computer’s determination of an impending stall. This major design blunder may have cost the lives of hundreds of people, and shows that “safety doesn’t come first — money comes first, and safety’s only utility in that regard is in helping to keep the money coming,” according to Travis. But he points out that it also reveals something more general, and much deeper: the growing use of software code that is simply not good enough.
I believe the relative ease — not to mention the lack of tangible cost — of software updates has created a cultural laziness within the software engineering community. Moreover, because more and more of the hardware that we create is monitored and controlled by software, that cultural laziness is now creeping into hardware engineering — like building airliners. Less thought is now given to getting a design correct and simple up front because it’s so easy to fix what you didn’t get right later.
Every time a software update gets pushed to my Tesla, to the Garmin flight computers in my Cessna, to my Nest thermostat, and to the TVs in my house, I’m reminded that none of those things were complete when they left the factory — because their builders realized they didn’t have to be complete. The job could be done at any time in the future with a software update.
Back in August 2011, Netscape founder and VC Marc Andreessen wrote famously that “software is eating the world“. He was almost right. It turns that shoddy software is eating the world, sometimes with fatal consequences.