RequestLink
MICRO
Advertiser and
Product
Information

Buyer's Guide
Buyers Guide

tom
Chip Shots blog

Greatest Hits of 2005
Greatest Hits of 2005

Featured Series
Featured Series


Web Sightings

Media Kit

Comments? Suggestions? Send us your feedback.

 

MicroMagazine.com

Data Analysis and Statistics

Overcoming problems associated with the statistical reporting and analysis of ultratrace data

Thomas J. Bzik, Air Products and Chemicals

The application of standard tools such as control charting and process capability indices to ultratrace data has not proven as beneficial in the semiconductor industry as in other industries. Additionally, nonstatistically based rules for accepting or rejecting product based on ultratrace specifications have not tended to work as well as past experience with nonultratrace processes has suggested. This article describes the problems associated with the analysis and use of ultratrace data. The reasons for the relatively persistent misapplication of data analysis tools to such data and the costs involved are also examined. Additionally, possible solutions, both statistical and nonstatistical, are developed.

Problems of Analyzing Ultratrace Data

Both consumers and producers of ultrapure products are constantly seeking materials that meet tight purity specifications. Such products command a premium and are relatively more difficult to manufacture and measure than less-pure materials. However, measurement systems often serve as the limiting constraint. In typical product specification negotiations, the focus is appropriately on manufacturing capability, while analytical capability is often underemphasized or even ignored. Consequently, ultratrace analytical methods are routinely used close to, at, or beyond their capability. Overestimating ultratrace process capabilities leads to the phenomenon known as statistical ratchet: When process capability is overestimated, a lower impurity specification is often requested, which leads to overly tight specifications.

The unrelenting pressure on specifications without an understanding of measurement-system limitations causes predictable problems. Substantial business costs accrue from the products for which unrealistically tight specifications have been set. Some of these incurred costs have been beneficial; for example, the added costs resulting from the limitations of analytical measurement systems have driven the long-term progress in ultratrace measurement capability. When sufficiently reliable, newly available ultratrace data can help guide process improvement programs. However, ongoing improvements in manufacturing or analytical capabilities are typically swamped by the relentless pressure for better specifications, allowing problems to recur.

In practice, ultratrace data are routinely misinterpreted in production situations. There is a strong systematic bias to conclude wrongly that statistical control has been lost. False indications of quality issues typically exceed those involving material acceptability. Producers of ultrapure materials conduct numerous root-cause analyses that in fact amount to  searches for nonexistent problems. When a root-cause analysis identifies a nonexistent problem, further resources are wasted trying to fix it. Producers may be pressured into modifying processes because of customer misperceptions that statistical control is frequently lost. Much effort is consumed by the internal and external debates that originate with quality misperceptions and the subsequent misguided efforts at fixing process or measurement protocols.

Materials falsely classified as out of specification are part of a producer's cost, but producer quality control costs ultimately lead to higher material costs for the customer. Other consumer quality control costs include the expenses incurred for remeasurements of product, quality audits, issue documentation and resolution, lost production time resulting from waiting for "good" materials, production risk assessment of "marginal material" use, and implementing quality improvement initiatives. These are only some of the costs associated with having a high false-alarm rate, which creates an atmosphere of suspicion. Business pressures and the pervasive use of inappropriate statistical methods fuel this costly situation.

Statistical Issues

Many problems related to the statistical analysis of ultratrace data originate from the perception that there is too much measurement uncertainty and from the traditional response to that perception: data censoring. Statistical quality control/statistical process control (SQC/SPC) practices compound the situation by routinely treating censored data as real data for statistical analysis, not appropriately accounting for sample size in control-limit estimations, ignoring the increased testing risks that arise from the use of multiple statistical tests, setting testing risk on a per-measurement basis, using flawed distributional assumptions, and underestimating variability.

A Shewhart control chart for individual points, such as the one shown in Figure 1a, is often used in ultratrace SQC/SPC analysis. Control limits (represented by the dashed lines) are calculated as either 2.66or 3s (from in-control data where , = average absolute range between neighboring points, and s =  pooled variance between neighboring points). Either calculation provides a nominal 3σ equivalent control chart.1,2 The nominal risk of finding an out-of-control data point for an in-control process is 0.27% (the implicit data-normality assumption). However, the true level of risk depends on the sample size and increases rapidly with decreased charting data. Consequently, it is advisable to use at least 25–30 points in chart construction.

Figure 1: Examples of control charts used in SQC/SPC analysis: (a) a Shewhart individual control chart, and (b) a moving-range chart. Both indicate upper and lower control limits (dashed rules). In addition, chart 1a indicates the process average and chart 1b shows the average absolute range between neighboring points (solid rules).

Not correcting for sample-size risk is a common failing. The use of 30 points in control-limit chart construction provides a 0.77% chance that the next in-control measurement will be determined to be out of control (about three times the perceived nominal risk). When a process is in control and normally distributed, there is approximately a 7% risk that a chart will indicate a loss of statistical control within the 30 points used to build the chart and about a 17% risk that there will be a loss-of-control result in the next 30 points tested. Typical distributional characteristics of ultratrace data often exacerbate these risks, which are compounded when multiple parameters are being charted.

Statistical risk is traditionally controlled at too low a level (single-property measurement) while testing frequency is ignored. If 30 points are used in constructing Shewhart individual control charts, the wider in-control ranges represented by 3.07 or 3.47s should be used to achieve the true normal 3σ risk level of 0.27%. With that change, the 7 and 17% risks described above will be reduced to 1.6 and 6.3%, respectively. However, such a change is only a partial fix because quality control risks must be controlled at the product level rather than at the product-property measurement level.

The risk of finding the next in-control point out of statistical control in training data sets using n = 10 or n = 20 points in control-limit chart construction with 2.66 is 1.1 or 2.4%, respectively. Sequential control-limit recalculation with large-sample-size rules will not resolve this issue; if limits are simply recalculated starting at n = 10 until n = 30, then 24.3% of the in-control sequences will be misidentified as out of statistical control. Control charting with limited data requires the use of appropriate statistical multipliers.

Control charts are sometimes used in conjunction with the Western Electric rules (tests for special causes). These rules focus on the identification of data patterns that indicate a systematic loss of control. For example, one version of the "run rule" says that if eight contiguous points are consistently above or consistently below the average, the process is not in control. If 30 in-control points defined the average, then application of the run rule for 30 additional in-control points would provide a 13.1% chance that the process will be found out of control. Other Western Electric rules are typically applied in a systematically uncontrolled manner and are sometimes statistically inefficient. Statistical methodologies that  control all statistical risks simultaneously and more-efficient tests are needed to replace the Western Electric rules.3,4

A moving-range chart (Figure 1b) is often used for process control when point-to-point variation is unusually large. If such a chart is used in conjunction with a Shewhart-type chart, its risks also need systematic integration. A calculated ultratrace lower control limit (LCL) is effectively irrelevant when it is at or below the method detection limit (MDL). Additionally, some practitioners ignore the LCL, since too little impurity is perceived as better than than too much. In such cases, the risk of a false out-of-control finding is one-half of that previously stated. Because the purpose of using control charting is to achieve product consistency, the use of above-MDL LCLs is an important consideration.

High-purity products often must meet multiple ultratrace-impurity specifications, which compounds the aforementioned statistical problems. Analysis of 30 trace metals, for example, results in 30 simultaneous sets of data. If each trace-metal analysis is based on using a 30-point in-control training set to calculate a nominal 3σ-equivalent control chart, there is a 90% risk that one or more of the 30 charts will show that the process violates its control limits. If these limits are applied to the next in-control product tested, there will be 20.7% risk that one or more of the control limits will be found to have been exceeded. When there is such a repeated high risk of obtaining out-of-control statistical results from in-control processes, statistical mayhem—with all its business consequences—will result.

The more product properties are specified, the worse the situation becomes. Smaller training data sets present greater risks. Even when 30-point in-control training sets and the recommended 3.07or 3.47s are used to calculate wider control limits to obtain true 3σ risk level, there is a 38.4% risk that one or more of the charts for 30 trace metals will show that an in-control process violates its control limits. When these limits are applied to the next in-control product tested, there will be a 7.5% risk that one or more control limits will appear to have been exceeded. Controlling each chart's risk level is helpful, but simultaneous control of risks for all of the charts is required. Such control is not current industry practice.

In addition, although Shewhart individual control chart performance degrades when in-control training data are not normally distributed, the normality assumption is rarely checked in practice. Some ultratrace data are anticipated to be nonnormal in distribution. The p-chart (assumed Poisson distribution), while often suggested for use with particle count data, is a flawed model for most ultratrace SQC/SPC. A single-parameter Poisson distribution incorporates only sampling error into the control limits. It does not effectively incorporate random process or random measurement variation. The effect, once again, is a greater-than-nominal risk of indicating that an in-control process is out of control.

Data-Handling Issues

The perception that there is too much measurement uncertainty and the ways that uncertainty is handled are critical to the successful statistical analysis of ultratrace data. If an ultratrace purity specification were 100 ppt and the measurement standard deviation were 50 ppt, that 50% relative standard deviation would typically be considered unacceptably large and the measured results not very trustworthy. The four common responses to this situation are data censoring (using rounding, physical limits, or detection limits), signal averaging, improving the measurement method, and leveraging data characteristics (e.g., image enhancement). Of these four techniques, data censoring is perhaps the most problematic, because it actively de-enhances data resolution and damages subsequent statistical analysis method reliability.

While statisticians are aware that the combined use of rounding, physical limits, and detection limits is problematic, they are often hesitant to question standard data-handling practices. Consequently, most statistical repair efforts focus on simple substitutions, nonnormal distributions, or nonparametric methods. These efforts have had limited success, since damage caused by data censoring can, at most, be only partially undone by subsequent manipulations. The optimal time to apply advanced statistical routines is after everything possible has been done to clean up the data.

Rounding. Data perceived to be too imprecise might be subjected to "comfort" rounding so that measurement reliability is not overrepresented. Practitioners of this method generally do what feels right based on the situation. For example, data values might be rounded (censored) to the nearest 0.1 ppb if the standard deviation is 0.06 ppb. Rounding to significant figures or digits is a more formalized technique than comfort rounding, but its rules are subject to interpretation. Published significant-figure/digit rules differ, such as the approximations provided for propagation of error. Issues such as which uncertainty sources to include and how to calculate the uncertainties to be propagated are typically undefined.

Intended to communicate both a measured value and an implicit measure of its uncertainty via a single result, the use of significant figures or digits always reduces the reliability of estimated statistics; the degree of damage to SQC/SPC statistics is determined by the relative aggressiveness of the rounding. In the case of an in-control process with high relative measurement uncertainty, for example, very few unique values may be left after rounding. Indeed, having very few different measured values is the typical data signature of overaggressive rounding. The subsequent use of a continuous distribution methodology, such as a Shewhart control chart, on such artificially overly discrete data will be problematic.

Physical Limits. Two of the physical limits used in the analysis of ultratrace data are zero (indicating that a specific measured contaminant is not present) and 100% (indicating the highest assay level). However, measurement systems sometimes indicate contamination levels of less than zero or an assay level higher than 100%. The causes of such unachievable results include measurement error, flawed calibration, or the subtraction of one result from another, each of which is subject to measurement error. Traditionally, such results are censored to physical limits. While that form of censoring seemingly improves the data, it actually underestimates uncertainty and over- or underestimates process averages. Figure 2 illustrates the effects of adopting a physical limit of zero. Censoring caused the misrepresentation of the data distribution's shape (in the graph at left), overestimation of the process average (0.12 censored versus 0.00 uncensored), and underestimation of the standard deviation (0.17 censored versus 0.31 uncensored). Such physical-limit censoring can have a significant effect on control charting and may obscure calibration and other measurement system issues as well.

Figure 2: Graphs showing the impact of applying a physical limit to ultratrace measurement data. A physical limit of zero was used in creating the graph at left, while the graph at right depicts uncensored data.

Detection Limits. A detection limit is a figure of merit for a detector or measurement instrument or process. Results below a specified detection limit are considered too unreliable for use in most statistical or practical applications. Although it has generally ceased, an older, questionable, practice was to record zero or "nondetectable" for below-detection-limit measurements without explicitly stating what the limit was. This practice created an incentive for producers of high-purity products to either use less-capable measurement methods than were then available or to claim artificially high detection limits. The misleading data that resulted led to the development of standards for using detection limits with ultratrace data. A current SEMI standard requires that measurement methods have an MDL at or below the specification being measured.5,6 (Other information on determining detection limits is also available from SEMI and elsewhere.7,8)

There are many definitions of the term detection limit. Variants include order-of-magnitude differences.9–16 Many acronyms for MDL (e.g., DL, LOD, LLD, LLOD, LOGD, and LOQ) are also used. However, the following discussion follows SEMI's use of MDL, which includes all major sources of measurement method variability.

MDLs are useful as measurement system figures of merit relative to specifications, but they are problematic when used to censor measurement data. Although that practice is not traceable to the SEMI MDL definition, it seems so intuitively logical that it goes unquestioned. Such censored data are either shown as the value of the MDL or as "<MDL." The latter form makes the censoring more apparent. During statistical analysis, <MDL data are typically treated as the MDL or as one-half the MDL. Either version leads to subsequent statistical mayhem.

Data censoring using the MDL also affects statisticians' ability to utilize the process capability measures Cp and Cpk (Cp = the number of 6σ intervals fitting into the specification interval and Cpk = the number of 3σ intervals fitting between the process average and nearest specification). Cp can only be defined when there are both upper and lower specifications. Therefore, in ultratrace data situations where the MDL is treated as the lower control limit, only Cpk is applicable. However, determining Cpk for ultratrace data can lead to a potentially gross overestimation of process capability, as the following example indicates.

Figure 3: Control charts showing the effects of replacing below-MDL data with the MDL for statistical analysis: (a) a Shewhart chart, and (b) a moving-range chart.

Table I presents actual measurements for 26 data points and the comparable results when a detection limit of 30 has been applied. Based on these data, Figures 3–5 graphically show the effects of different types of data substitutions (detection limit, one-half detection limit, and zero) on Shewhart and moving-range control charts, while Figure 6 shows charts for the uncensored data. The SQC summary statistics for the four data sets are given in Table II. As this table indicates, averages ranged from 2.4 to 30.1 depending on the type of substitution. Only the average for the one-half-MDL data was relatively close to that for the uncensored data for these particular ultratrace data points. The standard deviation and the upper control limit of the censored data were universally lower than those of the uncensored data, and some substitutions even resulted in an upper control limit that was less than the MDL. In addition, when three hypothetical specifications were evaluated, process capability when based on censored data was judged to be higher than when based on uncensored data, as shown in Table III.

Figure 4: Control charts showing the effects of replacing below-MDL data with one-half the MDL for statistical analysis: (a) a Shewhart chart, and (b) a moving-range chart.
Figure 5: Control charts showing the effects of replacing below-MDL data with zero for statistical analysis: (a) a Shewhart chart, and (b) a moving-range chart.
Figure 6: Control charts for uncensored below-MDL data: (a) a Shewhart chart, and (b) a moving-range chart.

The uncensored data in this example have a relative standard deviation of 46%. While each uncensored measurement in the collection of 26 data points is relatively unreliable, the results from these points can be reasonably statistically analyzed. In contrast, the use of a single-number MDL substitution in statistical analysis always results in biased averages (both underestimations and overestimations), biased standard deviations (typically underestimations), biased upper control limits (typically underestimations), and a biased Cpk (typically overestimations). Once data have been MDL censored, there is no graceful recovery. The use of advanced statistical methodologies for censored data (e.g., maximum likelihood estimation or correction factor tables) can only recover some fraction of the original statistical information.17

Data Type
Average
Standard Deviation
UCL
Points Out of Control
Detection limit
30.1
0.2
30.6
2
One-half detection limit
16.2
2.3
23.1
2
Zero
2.4
4.4
15.7
2
Uncensored data
20.2
9.2
47.7
0
Table II: Comparative statistical analysis scenarios using data from Table I and three types of data substitutions.

The MDL is the most common substitution made for the purpose of statistical analysis. Its use often results in over- or underestimations, as indicated in Tables II and III. The Cpk of 39 (spec = 50) in Table III misleadingly suggests that the specification for a process that has a true Cpk of 1.1 can be tightened. And in Table II, the standard deviation of 0.2 for the censored data is much lower than that for the uncensored data (9.2). In conjunction with insufficient statistical controls for small sample sizes and multiple-property testing, such flawed analyses lead to predictable misinterpretations in ultratrace data analysis. The failure to grasp the nature and scope of these routine misinterpretations has proven costly to both producers and consumers of ultratrace materials.

Data Type
Cpk for
Spec = 50
Cpk for
Spec = 40
Cpk for
Spec = 35
Detection limit
39.0
17.4
9.6
One-half detection limit
4.9
3.6
2.7
Zero
3.6
2.8
2.5
Uncensored data
1.1
0.7
0.5
Table III: Comparative Cpk statistical analysis scenarios using data from Table I, three types of data substitutions, and three hypothetical specifications.

Problems related to MDL data censoring are exacerbated by another traditional practice: MDL rounding. Many reported MDLs are rounded up, sometimes aggressively. For example, an MDL estimated at 63 ppt is likely to be reported as 100 ppt. High MDLs that are rarely exceeded are sometimes used as a crude technique for masking detection limit–induced statistical issues.

Another common statistical practice involves using the standard deviation of all the data in control-chart construction rather than basing limits on a short-term estimate of variability. This error allows process variability (if any) to widen the control limits and is not a good mechanism for offsetting the risks incurred by employing data-censoring practices.

In other cases, ultratrace specifications may be set lower than what is known about process or measurement variability and the MDL. Possible fixes for this problem include requiring that specifications be no lower than an MDL multiple (e.g., 2, 3, 5, or 10) and requiring a specific gauge capability (e.g., a measurement procedure variation of <10% of the product variation or, in the absolute worst case, <30% of the product variation). However, such practices do not resolve all problems that arise in handling censored data.

Another issue is what happens when a specification and a high MDL that is not biased are equal. Very substantial problems result, and a solution would require additional fundamental changes in how such relatively tenuous-quality data are interpreted. To avoid having to use different protocols to interpret such data safely, specifications should be set to an absolute minimum of twice the relevant MDL.

Finally, although quantifiable results are available for most ultratrace data currently reported as being below the MDL, it should be remembered that some detection limits are real. For example, it may be impossible to discern a peak because of instrument white noise. Advanced statistical methods are required for ultratrace data having a mix of quantifiable measurements and true nondetects.

Additional Issues and Solutions

The statistical methodologies used to analyze ultratrace data must function robustly in a nonideal environment. One way to address that challenge is the use of a simulation methodology that allows many statistical issues to be anticipated and planned for by determining the likely impact of an SQC/SPC strategy. Simulation requires the use of data distributions that are similar to real distributions and the use of all applicable data-handling rules.

Under some circumstances, ultratrace data can be in control and nonnormally distributed. When the upper tail of the distribution is more dispersed than the lower tail, the upper control limit of a Shewhart control chart may seem to have been violated. Such high nonrepresentative process measurements can result when outside contaminants have infiltrated the sampling or measurement environment. A mixture of distributions, often asymmetric, will result, perhaps with outliers in the upper tail. Therefore, it is premature to fit a nonnormal distribution until it has been verified, by remeasurement and resampling, that the nonnormality is inherent rather than an artifact of spurious contamination sources. If the offending sampling or measurement system is difficult to fix, quality-decision rules must be designed to compensate for what is known about the system's shortcomings.

Members of the electronics industry have been attempting to define statistical specifications (e.g., only in-control product is automatically acceptable). In light of the many issues discussed in this article and the numerous underlying and unaddressed SQC standardization issues (e.g., different companies requesting different statistical practices), these initial efforts are premature.

Conclusion

Data practitioners have a strong systematic bias to falsely conclude that process control has been lost, to overestimate process capability, and—when a specification and MDL are too close—to misclassify product quality. Producers and consumers of ultrapure materials incur substantial business costs as a result of such flawed data handling and analysis. Much effort is wasted attempting to fix process problems that are in reality predictable analysis artifacts.

Meeting this challenge requires major changes in the treatment and handling of ultratrace data. Data should not be censored, but if they must be, rational bounds should be built into the SQC/SPC procedures to provide a margin of error. Detection limits should be treated only as figures of merit for measurement processes. Specifications should not be set too close to a detection limit, nor should physical limits be applied. Ultratrace data should not be rounded aggressively, and all statistical risks in SQC/SPC testing should be evaluated and balanced simultaneously. To account for small sample sizes, statistical multipliers should be modified. Advanced statistical methods should be applied only to data that have not been censored needlessly.

Such changes will be strongly opposed. Fear of data overinterpretation has led the industry to systematically misinterpret ultratrace data while simultaneously corrupting statistical tools and surrendering vital information on process states. Customer education and the development of robust, standardized ultratrace statistical and data-handling methodology will be required. The Statistical Methods Task Force in SEMI Standards Process Chemicals is working to develop standard practices that are relevant to the analysis of ultratrace data and the development of statistical specifications. Although serious effort will be required to overturn accepted, but inappropriate, practices, the rewards will be worth it.

References

1. DC Montgomery, Introduction to Statistical Quality Control, 2nd ed. (New York: Wiley, 1991).

2. EL Grant and RS Leavenworth, Statistical Quality Control, 5th ed. (New York: McGraw-Hill 1980).

3. TJ Bzik and SN Kamat, "Control Charting with Limited Data," in Proceedings of the Section on Quality and Productivity—American Statistical Association (Alexandria, VA: American Statistical Association, 1995), 25–30.

4. TJ Bzik and SN Kamat, "Benchmarking of a New Short Run Control Charting Methodology" (paper presented at the American Statistical Association Annual Technical Meeting, (Chicago, August 7, 1996).

5. TJ Bzik, "Statistical Reporting and Analysis of Ultratrace Measurement Data: Issues and Solutions" (paper presented at Semicon West, San Francisco, July 22–24, 2002).

6. SEMI C1—Specifications for Reagents—Method Validation (San Jose: SEMI, 2003).

7. SEMI C10-0305—Guide for Determination of Method Detection Limits (San Jose: SEMI, 2005).

8. Air Products and Chemicals, MDL Estimator Software; available from Internet: www.airproducts.com/products/specialtygases/northamerica/download.htm.

9. PC Meier and RE Zund, Statistical Methods in Analytical Chemistry (New York: Wiley, 1993).

10. TJ Bzik, "Method Detection Limit Estimation Theory and Practice," in Proceedings of the Annual Technical Meeting of the Institute of Environmental Sciences and Technology (Mount Prospect, IL: IEST, 2000), 1–10.

11. "Limit of Detection," chap. 8 in Specialty Gas Analysis: A Practical Guidebook, ed. J Hogan (New York: Wiley, 1997).

12. SN Ketkar and TJ Bzik, "Calibration of Analytical Instruments—Impact of Nonconstant Variance in Calibration Data," Journal of Analytical Chemistry 72, no. 19 (2000): 4762–4765.

13. DW McCormack Jr., "Analysis with Data Beyond the Detection Limit," (paper presented at the Semicon West Statistical Methods Workshop, San Francisco, July 13, 2004).

14. TJ Bzik, "Method Detection Limits and the Statistical Treatment of Censored Trace Data" (paper presented at the Semicon Europa Analytical Methods Workshop, Munich, April 13, 2005).

15. ISO 11843-1, Capability of Detection, Part 1: Terms and Definitions (Geneva: International Organization for Standardization, 1997).

16. ISO 11843-1, Capability of Detection, Part 1: Methodology in the Linear Model Case (Geneva: International Organization for Standardization, 2000).

17. TJ Bzik, "The Statistical Treatment of Censored Trace Data" (paper presented at the Semicon West Statistical Methods Workshop, San Francisco, July 13, 2004).


Thomas J. Bzik is a research associate in statistical sciences.  From 1980 to the present, he has worked at Air Products and Chemicals (Allentown, PA), and from 1978 to 1979, he was at the Center for the Environment and Man. His areas of expertise include limits of detection, calibration, SQC, experimental design, regression analysis, and environmental statistics. Bzik has chaired the task force devoted to the development of SEMI standards for method detection limits, precision reporting, and method validation in statistical methods and the task force in the area of method validation in the process chemicals division. He has also contributed to the statistical components of FED-STD-209D, FED-STD-209E, and their subsequent international successor, ISO/TC 209. Bzik has worked with International Sematech, has contributed to the International Technology Roadmap for Semiconductors, and has also produced more than 60 publications. In 1975 he received a BS in mathematics from Kutztown University in Kutztown, PA, and in 1977 he received an MS in statistics from the University of Connecticut in Storrs. (Bzik can be reached at 610/481-6650 or bziktj@airproducts.com.)


MicroHome | Search | Current Issue | MicroArchives
Buyers Guide | Media Kit

Questions/comments about MICRO Magazine? E-mail us at cheynman@gmail.com.

© 2007 Tom Cheyney
All rights reserved.