For obvious reasons, the word “failure” has negative connotations. But in manufacturing, when something goes wrong—or fails—there can be an upside: the chance to learn and improve.

Breakdowns, accidents, disruptions, and quality issues are expensive, so the purpose of analyzing any failure should be to understand the underlying issue that caused (or could cause) a problem. It is estimated that quality-related costs within a manufacturing facility can amount to 15-20% of total operating costs.

Conducting an analysis and gathering insights enable measures to be taken to remediate the issue or prevent it from happening in the first place.

What is failure analysis?

If something has gone wrong, failure analysis is the systematic process of carrying out a root cause analysis and reporting on what needs to be done to prevent it from happening again. However, you don’t have to wait for a problem to strike before taking advantage of failure analysis methodologies. They can be employed to prevent potential failures, improve product design, ensure compliance, or carry out a liability assessment.

Carrying out a failure analysis

It pays to be prepared. In the case of failure analysis, this means having processes in place so a coherent plan of action can be triggered as soon as something goes wrong. The plan should include the following steps:

  • Organize a group of key stakeholders: The scope of those involved in a failure analysis will depend on the nature of the incident and the size and structure of the organization. Plant and maintenance engineers will often carry out the analysis, although some organizations may have reliability engineers or even specialist failure analysis engineers to assign to the task. If the appropriate expertise isn’t available internally, outside consultants may be hired. The analysis team will report to management—the exact reporting chain will depend on the nature of the incident being investigated.

  • Define the scope of the problem(s): For any failure analysis to be successful, there must be a clear understanding of what went wrong and what the investigating engineers are expected to report on. This should be set out in a problem statement, specifying which failure analysis techniques the team will use.

  • Identify failure modes and mechanisms: To analyze a failure, it is important to understand what the result or outcome (the failure mode) was. Examples include a breakdown or failure of machinery, or the production of poor-quality products. We then need to understand the mechanism(s) that led to the failure: e.g., was it faulty material, human error, machine malfunction, etc.

  • Collect and analyze all relevant data: All relevant quantitative and qualitative data needs to be collected and analyzed. Quantitative data includes maintenance and CMMS (computerized maintenance management system) records, along with details collected through visual inspection and troubleshooting of the machinery involved. Qualitative data is likely to include information collected through interviewing relevant staff (e.g., machine operators and maintenance technicians).

  • Determine Corrective Actions: The outcome of the investigation will be the preparation of a failure analysis report, setting out what has been discovered and, most importantly, recommending what needs to be done to correct the problem.

Failure analysis strategies and techniques

There are several well-recognized failure analysis methods. Some are more appropriate to use in certain industries, or the choice could depend on the specific circumstances or the experience of the engineers undertaking the analysis:

  • Failure modes and effects analysis (FMEA): This technique highlights failures within a particular system and is applicable to any phase of a process, including planning, designing, implementation, or inspection. It consists of two main components: Failure Mode (identifying different ways something can fail) and Effect Analysis (the consequences of each failure mode).

  • Cause and effect analysis: A diagram-based approach to assessing the problem, identifying the root cause(s), and creating a solution. It combines brainstorming and mind mapping techniques to explore the issue, and is a useful method for dealing with complex scenarios by breaking them down into smaller parts.

  • 5 Whys: A method of determining the root cause of a problem by successively asking the question “Why?”. It gets its name from the anecdotal observation that five iterations of asking “why?” is usually sufficient to reveal the root cause but, depending on the scenario, the question may be asked more or fewer times.

  • Fishbone (Ishikawa) Diagram: A visual technique for causal analysis that can be an especially helpful brainstorming tool when little quantitative data is available. It involves drawing a “fishbone” diagram consisting of possible causes of a problem (the bones), connected to a spine leading into the fish’s head, which symbolizes the defect or problem.

  • Fault/Logic Tree Analysis: A method where Boolean logic relationships are used to identify the root cause by modeling how failure propagates through a system. It is commonly used in industries such as aerospace, energy, and defense.

  • Change analysis/Kepner-Tregoe: A structured methodology for gathering, prioritizing, and evaluating information. Its key benefit is its ability to prioritize and focus the analysis by weighing and setting objectives. The Kepner-Tregoe method became famous when NASA used it to bring the Apollo 13 team home.

Manufacturers should understand failure analysis and be prepared to implement a response when something goes wrong. In doing so, they will find themselves closer to identifying the root cause of their problem, and closer to bouncing back from it.

If you’re interested in learning how Tulip can help automate real-time data collection and visualize metrics for faster, easier failure analysis, reach out to a member of our team today!

Digitize an streamline your failure analyses with Tulip

Learn how you can automate data collection to improve your quality management and continuous improvement processes.

Day in the life CTA illustration