In our Previous article, we have discussed Fundamentals of Software Testing

Examples of software failure are depressingly common. Here are some you may recognize:

  • The first launch of the European Space Agency Ariane 5 rocket in June 1996 failed after 37½ seconds. A software error caused the rocket to deviate from its vertical ascent, and the self-destruct capabilities were enacted before the then unpredictable flight path resulted in a bigger problem.
  • When the UK Government introduced online filing of tax returns, a user could sometimes see the amount that a previous user earned. This was regardless of the physical location of the two applicants.
  • In November 2005, information on the UK's top 10 wanted criminals was displayed on a website. The publication of this information was described in newspapers and on morning radio and television and, as a result, many people attempted to access the site. The performance of the website proved inadequate under this load and the website had to be taken offline. The publicity created performance peaks beyond the capacity of the website.
  • When a well-known online book retailer first went live, ordering a negative number of books meant that the transaction sum involved was refunded to the ‘purchaser’. Development staff had not anticipated that anyone would attempt to purchase a negative number of books. Code was developed to allow refunds to customers to be made by administrative staff—but self-requested refunds are not valid.
  • A small, one-line, change in the billing system of an electrical provider blacked out the whole of a major US city.
What is it about these examples that make them so startling? Is it a sense that something fairly obvious was missed? Is it the feeling that, expensive and important as they were, the systems were allowed to enter service before they were ready? Do you think these systems were adequately tested? Obviously they were not, but in this book we want to explore why this was the case and why these kinds of failure continue to plague us.

To understand what is going on we need to start at the beginning, with the people who design systems. Do they make mistakes? Of course they do. People make mistakes because they are fallible, but there are also many pressures that make mistakes more likely. Pressures such as deadlines, complexity of systems and organizations, and changing technology all bear down on designers of systems and increase the likelihood of errors in specifications, in designs and in software code. These errors are where major system failures usually begin. If a document with an error in it is used to specify a component the component will be faulty and will probably exhibit incorrect behavior. If this faulty component is built into a system the system may fail. While failure is not always guaranteed, it is likely that errors in specifications will lead to faulty components and faulty components will cause system failure.

An error (or mistake) leads to a defect, which can cause an observed failure.

Effect of an error

There are other reasons why systems fail. Environmental conditions such as the presence of radiation, magnetism, electronic fields or pollution can affect the operation of hardware and firmware and lead to system failure.

If we want to avoid failure we must either avoid errors and faults or find them and rectify them. Testing can contribute to both avoidance and rectification, as we will see when we have looked at the testing process in a little more detail. One thing is clear: if we wish to influence errors with testing we need to begin testing as soon as we begin making errors—right at the beginning of the development process—and we need to continue testing until we are confident that there will be no serious system failures—right at the end of the development process.

Before we move on, let us just remind ourselves of the importance of what we are considering. Incorrect software can harm:
  • people (e.g. by causing an aircraft crash in which people die, or by causing a hospital life support system to fail);
  • companies (e.g. by causing incorrect billing, which results in the company losing money);
  • the environment (e.g. by releasing chemicals or radiation into the atmosphere).
Software failures can sometimes cause all three of these at once. The scenario of a train carrying nuclear waste being involved in a crash has been explored to help build public confidence in the safety of transporting nuclear waste by train. A failure of the train's on-board systems or of the signalling system that controls the train's movements could lead to catastrophic results. This may not be likely (we hope it is not) but it is a possibility that could be linked with software failure. Software failures, then, can lead to:
  • Loss of money
  • Loss of time
  • Loss of business reputation
  • Injury
  • Death
You may follow the complete series of Fundamentals of Testing articles here:

Why a Software Fails?
Keeping Software Test Under Control
What Testing is and What Testing Does
Software Testing Principles
Fundamental Software Test Processes
Psychology of Software Testing
Testers Code of Ethics
ISTQB Sample Questions

0 comments