software disasters

. toyota camry firmware bug ($1.2 billion fine DOJ)
Toyota's killer firmware: Bad design and its consequences (October 28, 2013)
EDN article EE Times article SW Risks ? toyota MISRA C
Barr's ultimate conclusions were that: Toyota’s electronic throttle control system (ETCS) source code is of unreasonable quality. Toyota’s source code is defective and contains bugs, including bugs that can cause unintended acceleration (UA). Code-quality metrics predict presence of additional bugs. Toyota’s fail safes are defective and inadequate (referring to them as a “house of cards” safety architecture). Misbehaviors of Toyota’s ETCS are a cause of UA.
The $1.2 billion penalty against Toyota is the largest ever imposed on an automaker, but it's not overkill. The punishment fits the crime. In the settlement announced Wednesday, the world's largest car manufacturer confessed to misleading consumers and deceiving regulators for years about deadly safety defects that caused vehicles to suddenly accelerate. This is a landmark and welcome action by the U.S. Justice Department, forcing Toyota to settle criminal charges with a fine that is 35 times higher than civil penalties that can be levied by the National Highway Traffic Safety Administration. In 2010 and 2012, Toyota paid maximum fines totaling $50 million for failing to report the accelerator defect on time. Read more here:

. government blunders cost $
Government Computer Blunders are Common (January 29, 2005)
The FBI's failure to roll out an expanded computer system that would help agents investigate criminals and terrorists is the latest in a series of costly technology blunders by government over more than a decade. The FBI said earlier this month it may shelve its $170 million "Virtual Case File" project because it is inadequate and outdated. The system was intended to help agents, analysts and others around the world share information without using paper or time-consuming scanning of documents. Experts blame poor planning, rapid industry advances and the massive scope of some complex projects whose price tags can run into billions of dollars at U.S. agencies with tens of thousands of employees. "There are very few success stories," said a former deputy chief information officer at the Pentagon. "Failures are very common, and they've been common for a long time." "Ever since there's been IT (information technology), there have been problems," said Allan Holmes, Washington bureau chief for CIO, a magazine published for information executives. "The private sector struggles with this as well. It's not just ... the federal government that ... can't get it right. This is difficult."

. false missile launch: sunlight reflection off of cloud tops
Faulty Soviet Early Warning System Nearly Causes WWIII (1983)
The threat of computers purposefully starting World War III is still the stuff of science fiction, but accidental software glitches have brought us too close in the past. Although there have been numerous alleged events of this ilk, the secrecy around military systems makes it hard to sort the urban myths from the real incidents. However, one example that is well recorded happened back in 1983, and was the direct result of a software bug in the Soviet early warning system. The Russian system told them that the United States had launched five ballistic missiles. However, the duty officer for the system, one Lt Col Stanislav Petrov, claims he had a "funny feeling in my gut", and reasoned if the U.S. was really attacking they would launch more than five missiles. The trigger for the near apocalyptic disaster was traced to a fault in software that was supposed to filter out false missile detections caused by satellites picking up sunlight reflections off cloud-tops.

. satellite-launching rocket explodes: 64 bits into a 16 bit space
The Explosion of the Ariane 5 (1996)
It took the European Space Agency 10 years and $7 billion to produce Ariane 5, a giant rocket capable of hurling a pair of three-ton satellites into orbit with each launch and intended to give Europe overwhelming supremacy in the commercial space business. In 1996, Europe's newest and unmanned satellite-launching rocket, the Ariane 5, was intentionally blown up just seconds after taking off on its maiden flight from Kourou, French Guiana. The European Space Agency estimated that total development of Ariane 5 cost more than $8 billion. On board Ariane 5 was a $500 million set of four scientific satellites created to study how the Earth's magnetic field interacts with Solar Winds. According to a piece in the New York Times Magazine, the self-destruction was triggered by software trying to stuff "a 64-bit number into a 16-bit space." "This shutdown occurred 36.7 seconds after launch, when the guidance system's own computer tried to convert one piece of data--the sideways velocity of the rocket--from a 64-bit format to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, it passed control to an identical, redundant unit, which was there to provide backup in case of just such a failure. But the second unit had failed in the identical manner a few milliseconds before. And why not? It was running the same software," the article stated.

.. metric conversion
Mars Climate Observer Metric Problem (1998)
Two spacecraft, the Mars Climate Orbiter and the Mars Polar Lander, were part of a space program that, in 1998, was supposed to study the Martian weather, climate, and water and carbon dioxide content of the atmosphere. But a problem occurred when a navigation error caused the lander to fly too low in the atmosphere and it was destroyed. What caused the error? A sub-contractor on the NASA program had used imperial units (as used in the U.S.), rather than the NASA-specified metric units (as used in Europe).

. 2 digits = billions of $: the year 2000
The Two-digit Year-2000 Problem (1999/2000)
Many IT vendors and contractors did very well out of the billions spent to avoid what many feared would be the disaster related to the Millennium Bug. Rumors of astronomical contract rates and retainers abounded. And the sound of clocks striking midnight in time zones around the world was followed by... not panic, not crashing computer systems, in fact nothing more than New Year celebrations. So why include it here? That the predictions of doom came to naught is irrelevant, as we're not talking about the disaster that was averted, but the original disastrous decision to use and keep using for longer than was either necessary or prudent double digits for the date field in computer programs. A report by the House of Commons Library pegged the cost of fixing the bug at 400 billion pounds.

. therac-25 kills five people with radiation
Therac-25 was a radiation therapy machine produced by Atomic Energy of Canada Limited (AECL) and CGR MeV of France after the Therac-6 and Therac-20 units. It was involved with at least six known accidents between 1985 and 1987, in which patients were given massive overdoses of radiation, which were in some cases on the order of hundreds of grays. At least five patients died of the overdoses. These accidents highlighted the dangers of software control of safety-critical systems, and they have become a standard case study in health informatics. We know that the software for the Therac-25 was developed by a single person, using PDP 11 assembly language, over a period of several years. The software "evolved" from the Therac-6 software, which was started in 1972. According to a letter from AECL to the FDA, the "program structure and certain subroutines were carried over to the Therac 25 around 1976." Apparently, very little software documentation was produced during development. In a 1986 internal FDA memo, a reviewer lamented, "Unfortunately, the AECL response also seems to point out an apparent lack of documentation on software specifications and a software test plan." The general consensus is that the Atomic Energy of Canada Limited is to blame. There was only one person programming the code for this system and he largely did all the testing. The machine was tested for only 2700 hours of use, but for code which controls such a critical machine, many more hours should have been put in to the testing phase. Also Therac-25 was tested as a whole machine rather then in separate modules. Testing in separate modules would have discovered many of the bugs. Also, if the AECL believed that there were problems with the Therac-25 right after the first incident then it is possible that most of the 5 other incidents could have been avoided and possibly the 3 fatalities.

. patriot missile system
Prior to the Persian Gulf War, ballistic missile defense was an unproven concept in war. During Operation Desert Storm, in addition to its anti-aircraft mission, Patriot was assigned to shoot down incoming Iraqi Scud or Al Hussein short range ballistic missiles launched at Israel and Saudi Arabia. The first combat use of Patriot occurred 18 January 1991 when it engaged what was later found to be a computer glitch.[3] There were actually no SCUDs fired at Saudi Arabia on 18 January[4] This incident was widely misreported as the first successful interception of an enemy ballistic missile in history. Throughout the war, Patriot missiles attempted engagement of over 40 hostile ballistic missiles. The success of these engagements, and in particular how many of them were real targets is still controversial. Failure at Dhahran
On February 25, 1991, an Iraqi Scud hit the barracks in Dhahran, Saudi Arabia, killing 28 soldiers from the US Army's 14th Quartermaster Detachment. A government investigation revealed that the failed intercept at Dhahran had been caused by a software error in the system's clock. The Patriot missile battery at Dhahran had been in operation for 100 hours, by which time the system's internal clock had drifted by one third of a second. For a target moving as fast as an inbound TBM, this was equivalent to a position error of 600 meters. The radar system had successfully detected the Scud and predicted where to look for it next, but because of the time error, looked in the wrong part of the sky and found no missile. With no missile, the initial detection was assumed to be a spurious track and the missile was removed from the system. No interception was attempted, and the missile impacted on a barracks killing 28 soldiers. At the time, the Israelis had already identified the problem and informed the US Army and the PATRIOT Project Office (the software manufacturer) on February 11, 1991, but no upgrade was present at the time. As a stopgap measure, the Israelis recommended rebooting the system's computers regularly, however, Army officials did not understand how often they needed to do so. The manufacturer supplied updated software to the Army on February 26, the day after the Scud struck the Army barracks.

. teacher paid $7.9 million
DETROIT, Michigan (AP) (9/02) -- Thanks to a computer glitch, a public school teacher was paid $7.9 million ($4,015,624.80 after taxes). Someone alerted the school district earlier this month, and the money was returned after six days, an official said. The error occurred when a clerk entered an employee number in the hourly wage field. The payroll software didn't catch the mistake. "One of the things that came with (the software) is a fail-safe that prevents that. It doesn't work," Forrest said. The district has since installed a program to flag paychecks exceeding $10,000.