Managing Maintenance Costs-A Process Variability Approach

By Bill Kelleher, Resultant and Pat Krick, Senior Vice President


Background

Most heavy equipment maintenance organizations have access to large quantities of data, but this data tends to be used primarily for budgeting, accountability, and reliability purposes. In contrast, manufacturing organizations focus a significant portion of their data analysis on methods to reduce process variability and process cost. Reducing process variability to improve performance, lower cost, and increase equipment availability is just as important in a heavy maintenance environment as it is in manufacturing. Many equipment maintenance organizations can improve their overall costs and performance significantly by borrowing a few simple techniques from the manufacturing world. Provided herein is an overview of how these techniques are used and implemented.

The Maintenance Process

The process under consideration involves maintenance of large capital units, including both repair and major routine maintenance. The costs of the maintenance operation are driven by the cost of labor to perform the maintenance and the cost of material consumed during the maintenance. The process was divided into two categories: scheduled and unscheduled maintenance. Unscheduled maintenance was based on failures. Scheduled maintenance occurred at specific time intervals based on manufacturer’s recommendation for routine maintenance and the reliability of the units. Each maintenance event had a defined checklist of tasks to be performed and parts to be replaced.

Figure 1 shows a simplified version of the process with each maintenance event consisting of multiple tasks.

Analyzing the Data

A simple approach was used to analyze variability in the time required to perform the specified maintenance events. (In this case the time dimension was man-hours rather than elapsed time.) Data existed in the client’s records for the actual man-hours required to perform the scheduled maintenance tasks. Using Excel, the data was easily presented in a histogram format.

Figure 2 shows the distribution of man-hours consumed by a major planned maintenance event. Data was initially gathered over a three-month time period. 

There is a tendency for operating management to focus on average values to establish the baseline or “as is” process and to subsequently drive process improvement. However, useful process information was obtained by examining a range of data values, such as that presented in Figure 2.

A quick look at the frequency distribution provides a good indication of how well the process is being controlled, something that cannot be determined from looking at the average alone. A tightly clustered distribution is a good indicator of a process that is in control, while the broad distribution in Figure 2 suggests a process that is not well controlled. In the latter case, analyzing the highest and lowest times for performing maintenance tasks provides fertile ground for improvement opportunities focused on removing process barriers.

Starting with a sample of the worst data points, investigative work must then be conducted to identify specific causes in the process that can account for variation relative to the average.

Typical causes might include:

  • Unskilled or junior people doing the tasks (more training may be needed)
  • Critical material not available during the repair or service (or arrived late)
  • Required tools not available on a timely basis
  • Repair procedures either not readily available or out of date
  • Low First Pass Yield performing some tasks (First Pass Yield is the percentage of tasks done right the first time without requiring rework. This is a powerful tool that will help reduce variability by eliminating rework or doing the task a second time)
  • More people than needed working a job
  • Supervision on second or third shift (where much of the work is performed) is inexperienced because they are the junior supervisors and haven’t received adequate mentoring and training
  • Defective material or components driven by poor reliability of the OEM at manufacture

The next step is to perform a similar analysis on a sample of the best data points, looking for what was done right in these cases that allowed for significantly better performance than average.

Some examples might include:

  • Visual aids, including guidelines and checklists
  • Best practice techniques for specific operations
  • Improved sequence of tasks
  • Improved teamwork
  • Specific skills or training that could be expanded
  • All materials available as needed
  • A reliability program which identifies and corrects defects through a root cause analysis process, reducing the need to perform certain activities.

Since a typical histogram of this type will include a long tail on the poor end of the performance curve (see Figures 2, 4 and 5) reducing variability will also reduce the average maintenance time—perhaps significantly. In order to drive variability out, generate more predictable results, and reduce the average time per maintenance event, the organization will need to focus on those factors causing the performance extremes, as identified through investigation.

Once the initial analysis has been completed and process improvements are under way, it’s important to look at both the mean and standard deviation (a measure of the spread) of the data to gain a complete picture. Excel was used to easily calculate the standard deviation. The standard deviation plotted over time will show that the process is better controlled and is improving. Figure 3 shows such a plot.

Applicability to Smaller Maintenance Events

A comparable analysis was performed on a variety of shorter, individual repair tasks, and showed a similar degree of variability as can be seen in Figure 4. This histogram shows a bunching of the data on the left side but a very wide distribution and long tail to the right, indicating significant opportunity to improve the process by attacking the barriers that cause the wide differences in hours charged.

Variability and Cost

Histograms like Figures 2 and 4 were used to project the cost reduction that should be realized by improving the process. The area under the curve is directly related to the labor cost component of maintenance (or whatever process is under analysis), and a total labor cost can be calculated for any curve. Reducing the variability of the process will generate a new curve, reflecting significantly lower labor cost; the difference between the old curve and the new one gives the specific cost improvement.

Summary

Figure 5 summarizes where to look in a typical histogram to get the benefits of the variability analysis. Note that this distribution is bimodal, a common phenomenon also seen in the actual data presented in Figures 2 and 4. Understanding what drives such a bimodal distribution in any particular maintenance process is part of the investigative work that needs to be conducted.

Fairly simple analysis techniques, like those presented here, can identify significant cost savings opportunities in heavy equipment maintenance or other service-type environment. More often than not, the data to do the analysis resides within an existing work order or labor reporting system; yet the extra step of extracting it to Excel and generating the histograms has never been taken. Frequently, when charts like these are developed, management is surprised by the high degree of process variability. After the initial shock, the hard data serves as a call to action. Analysis of variability through this type of approach can be the launch platform for a major performance improvement and cost reduction effort.

However, the direct impact of reduced variability on maintenance costs, through improved cycle times and first pass yields, is not the only benefit achievable through such improvements. Perhaps even more importantly, increased maintenance efficiency and reduced mean-time-between-failure (which follows directly from a maintenance process that is well-controlled) will enable the transportation service provider to stretch the availability of equipment. As a result, transportation network managers are able to, depending on their market and/or operational circumstances:

  • Increase revenue or utility with the same fleet, or
  • Maintain revenue or utility while taking a reduction in the fleet, or
  • Some combination of these two

In applications where a publicly subsidized service (urban mass transit, intercity rail passenger service) is at stake, improved availability can be leveraged to reduce the cost and size of the fleet, while maintaining or improving service levels, thus reducing the subsidy burden such systems represent to communities. More often however, in the current transportation environment of ever increasing demand and tight capacity, network managers see increased availability as a means to ramp up operations and service levels.

The major railroads in the US provide a good example of this phenomenon. In 2005 the seven largest systems (Class I railroads) in the US spent about $3.4 Billion on the maintenance of locomotives. So, every 1% improvement in locomotive maintenance efficiency generates direct savings of roughly $34 Million. In addition, if we assume that incremental revenue can be earned with any excess locomotives, a 1% increase in fleet availability would yield $49 Million in increased earnings. (In 2005, Class I earnings were roughly $4.9 Billion from a 23,000 unit locomotive fleet.) Given the freight transportation industry’s characteristics of more recent years, i.e. tight capacity and ever increasing demand for rail service, that assumption may not be much of a stretch.

Airline Maintenance Repair and Overhaul (MRO) operations are another excellent example where detailed analysis can save the commercial aviation industry hundreds of millions of dollars in costs each year. The MRO market share in 2005 was $ 38.3 Billion for aircraft, engine and component repair and encompasses more than 17,000 commercial aircraft. Although many companies are pursuing Lean and Six Sigma initiatives, much work remains to be done to change cultures and streamline maintenance and repair operations because Lean and Six Sigma, when employed as point-solutions, do not provide the comprehensive change management framework needed for the systematic process improvement that quickly drives better corporate performance.

Finally, it’s a fact that the benefits of an improvement program based on this type of analysis are not automatic. Maintenance process improvements will not reach the bottom line unless excess labor cost (and all of the other costs associated with substitute processes) is removed—although this, at least, is generally within the control of the equipment maintenance group. Taking advantage of increased equipment availability will require explicit plans to be developed by network managers to make use of this additional capacity. Developing and executing such plans will require close coordination between different functional groups within the organization.

Presented at the International Society of Logistics Engineers 2006 Conference “The Next Generation of Logistics” panel on “Embedding Logistics into Disaster Planning, Prevention, and Preparation”

Click here to print PDF file