RequestLink
MICRO
Advertiser and
Product
Information

Buyer's Guide
Buyers Guide

tom
Chip Shots blog

Greatest Hits of 2005
Greatest Hits of 2005

Featured Series
Featured Series


Web Sightings

Media Kit

Comments? Suggestions? Send us your feedback.

 

MicroMagazine.com

Process Equipment Control

Using real-time process control to enhance performance and improve yield learning

Raymond J. Bunkofske, IBM

Process variations can be detected after a single run by using multivariate statistics to analyze raw process data.

Because a consistent process is much easier to tune than an inconsistent one, the stability of semiconductor manufacturing processes is critical to achieving high device yields. There are many ways to measure process performance. Unfortunately, since the one that really matters--yield--is often many days and wafers away from the actual process run, it is important to implement an intermediate measure to minimize the mean time to detect (MTTD) of any process variations. In-line methods such as photo-limited yield, scanning electron microscopy, and kerf electrical measurements can be useful, but they have an MTTD on the order of days or weeks. Although they may indicate that a problem exists, they do not provide sufficient insight into why. After a defect excursion is discovered through these methods, a process engineer must then examine the tool or maintenance logs or conduct experiments to determine its root cause.

Real-time process control (RTPC) offers a way to decrease the MTTD to one run of a process tool. It also gives a fab's mechanical engineering staff a way to quickly access second-by-second process-related data for either the wafer currently being run or one that ran six months ago. Based on experiences with RTPC at the IBM-Burlington facility (Essex Junction, VT), this article discusses how this technique can improve process consistency and yield learning.

RTPC is achieved by using data acquisition software to collect data from a process tool and any auxiliary sensors. These data are then analyzed using multivariate statistics to detect any differences between the response of the process and a historically derived response model, which provides a fingerprint for the process. Calculated using responses gathered across many wafers, this model describes a region in n-space that results in acceptable process performance. If the process response data match the model reasonably well, the tool continues processing wafers. If not, a Pareto analysis of the likely sources of the differences is generated to simplify the identification and correction of the problem.

Implementing the RTPC System

The RTPC system under investigation operates as shown in Figure 1. After the data acquisition system collects process-related data from the tool and its related sensors, the data are synchronized, subjected to any required signal processing, and summarized into conventional statistics (mean, standard deviation, etc.). A Hotelling T2 figure of merit (FOM) is then calculated for carefully chosen groups of statistics. The data are saved twice during the data acquisition and summarization process: First, the raw time-series data from the process equipment are saved for possible use later, and second, the summary statistics are saved in an SAS dataset for access through the manufacturing information warehouse.

 
Figure 1: Overall outline of information flow in the RTPC system.

While the data acquisition process is straightforward, the analyses usually are not. There are tens or hundreds of parameters available, and determining which of these to monitor is a challenge. The two multivariate statistical techniques currently used to assist in the analysis are principal component analysis and partial least squares. The former concentrates on determining which parameters are responsible for variations within the independent variables, while the latter aids in determining which independent variables are driving variations in such dependent parameters as in-line electrical test parameters and functional yield. Once parameters have been selected for analysis, a process response model (PRM) is generated. Using historical data, univariate means and standard deviations are calculated to scale the incoming data to a mean of zero and a standard deviation of one. This is done in two passes in order to eliminate outliers that might unduly influence the PRM. The flow of the analysis can be summarized as follows:

1. The time-series data are normalized through various signal-processing techniques such as logarithmic transformations and high-pass-frequency filters.

2. The normalized data are summarized, or collapsed in time, by finding averages and standard deviations.

3. A matrix of means and standard deviations is calculated for each parameter of the collapsed data.

4. Any points more than three standard deviations from the mean are eliminated.

5. Another matrix of means and standard deviations is calculated.

6. A covariance matrix is calculated for use during the multivariate calculations to correct for any cross-correlation in the data.

7. Finally, a figure of merit--the Hotelling T2 statistic--is calculated for each group of parameters defined by the user.

 

The Hotelling T2 statistic is calculated using the equation

where xi is a particular summary statistic, is the mean of that summary statistic, and ­1 is the inverse covariance matrix. Most easily understood as the "distance from center" of the normal operating space of the process, the Hotelling T2 scales and centers the data and accounts for any correlation.

Once the Hotelling T2 figure of merit is calculated, it is normalized by its statistically derived 2 limit:

 

This normalization allows all the data from a single process chamber, regardless of product or recipe, to be plotted on a single statistical process control (SPC) chart. It also means that the statistical limit for every RTPC multivariate chart on the manufacturing line is 1.0, making it relatively easy to compare performance across process tools or recipes. At this point in the analysis process, the results are sent to the SPC system, which decides if the process run was out of control and, if so, issues a command to stop the tool from further processing.

Although the Hotelling T2 statistic provides a robust means of detecting a process variation, it provides no insight into which parameters are responsible for the variation and how much each parameter contributes to the figure of merit. To solve this problem, a software algorithm was developed and a Pareto for out-of-control points was included in the on-line SPC charts. The operator simply presses a function key to bring up the Pareto. A typical multivariate control chart with an upper control limit of 1 and an average value around 0.5 is shown in Figure 2. There are two points on this chart above the control limit. To determine which process parameters are responsible for these out-of-control conditions, the operator presses the appropriate function key and a Pareto, such as that in Figure 3, is displayed. The logistics information is shown at the top and a list of the offending parameters below. In this example, the range of the ozone mass-flow controller (MFC) and the standard deviation of the chamber pressure are the most outstanding contributors. Armed with this information, maintenance technicians can fix the problems expeditiously. In addition, because the technicians know immediately what to fix, maintenance driven by out-of-control events can be minimized. At the fab under investigation, it has been reduced by 10­15%.

 
Figure 2: Typical multivariate control chart.

--APPENDED BY ETBSIDE ON 04-14-99 AT 01:05:05--
LOT XOU16001JX, SLOT 20, WAFER: IJX3920, RCP: A_4.5B 4.2P 10K
CONTRIBUTION (%)
PARAMETER
56.75
RANGE OZONE MFC
31.44
STANDARD DEVIATION
CHAMBER PRESSURE
7.42
RANGE OF TEOS MFC
4.34
STD. SUSCEPTOR TEMPERATURE
0.02
RANGE TEB1 MFC
0.02
RANGE He-HIGH MFC
Figure 3: Example of an out-of-control Pareto displayed by an on-line SPC system.

Before this multivariate analysis method was adopted, the fab tried to implement SPC charts for individual process parameters. However, that practice posed several difficulties: the number of SPC charts was unwieldy, the lack of cross-correlation severely restricted the robustness of the system, control limits were not kept up-to-date, and making comparisons across recipes and tools was difficult because of their varying set points and normal process variation.

The problem of having too many SPC charts can be illustrated by considering the case of a single process tool. A particular oxide etcher has 4 chambers, 8 frequently used recipes per chamber, and 15 parameters per recipe that must be monitored. Performing conventional univariate SPC analysis would require 480 SPC charts (4 x 8 x 15). Monitoring so many charts is not practical. Using a multivariate process response model, the same tool can be monitored with four charts, one per chamber, with one point on each chart for each wafer/recipe step processed.

It is easy to see why using multivariate methods makes monitoring and control easier, but why is this technique more robust than univariate analysis? When multiple parameters are monitored through univariate analysis, it is necessary to correct for two factors--the number of parameters and cross-correlation--because even with independent variables the "in-control" region for the parameters taken jointly is not the union of the separate regions, but rather something smaller. This concept is illustrated in Figure 4, where the dashed-rule box represents traditional SPC limits. When the existence of two simultaneous variables is taken into account, however, the actual in-control region is that within the oval. Although points inside the rectangle but outside the oval are actually out of control, a univariate SPC system would not interpret them as such, yielding false positives. The limits could be tightened and the SPC-limit rectangle could be placed inside the oval, but then any points outside the rectangle and inside the oval would be false negatives.

A simple formula for estimating the probability of a false call is

Probability of a false call = 1 ­ (1 ­ )n

 
Figure 4: Chart showing univariate (rectangular region) and joint (oval region) probability control limits.

where n is the number of parameters monitored and is the chosen significance level, usually 0.05 or 0.01. The upper line in Figure 5, representing a significance level of 0.05, indicates that the probability of a false negative, which would lead to a tool shutdown when nothing is wrong, approaches 40% when 10 parameters are being monitored using univariate analysis. The likelihood of a false call rises with an increase in the number of monitored parameters. The multivariate method is more robust than the univariate method because it accounts for many parameters at once and corrects for any cross-correlation, thus helping to prevent false interpretations and to minimize downtime.

 
Figure 5: Probability of a false negative as a function of the number of monitored parameters.

The RTPC System in Practice

The RTPC system has been in use at IBM-Burlington for approximately four years and is currently used for monitoring more than 275 process tools. In this period more than $10 million have been saved on a total investment of less than $3 million. The following case studies highlight the successes of this system.

Case 1: RF Generator Connected to a Dummy Load. In this case, when maintenance was being performed on the radio-frequency (RF) subsystem of an oxide etch tool, the technician failed to inform the next shift that the RF generator was plugged into a dummy load for calibration. As a result, manufacturing operators started processing wafers on the tool. Because it was a timed process, there were no endpoint signals to indicate that something was amiss. Moreover, because the RF generator carried a perfect 50- load, no alarm was triggered. The RTPC system, however, detected that several parameters, such as reflected RF power and dc bias, did not match the PRM. Based on these data, the tool was stopped after one lot, and only that lot had to be scrapped. If RTPC had not been available, the next inspection would have occurred only after 20 lots had been processed. Therefore, RTPC was responsible for saving 19 lots.

Case 2: Loose Throttle-Valve Grounding Wire. In the second case, one or two wafers per lot at contact etch were not endpointing correctly. The problem was intermittent and the endpoint traces provided no real clues as to what was going awry. Therefore, an experiment was proposed in which the process would be run under differing conditions to determine the source of the problem. At that time the RTPC system was just being implemented and no real controls were in place. However, the data were available and it was decided to analyze them before undertaking the costly experiment.

The ensuing analysis showed a sinusoidal variation in pressure at 0.5 Hz, which was significant because this particular process uses the magnetic field, which rotates at 0.5 Hz. When the tool was inspected, it was discovered that the grounding wire on the throttle valve had an intermittent open. The wire was acting like an antenna, picking up the signal from the magnetic field coils, and oscillating. This in turn caused pressure perturbations that prevented some wafers from endpointing properly. By using the RTPC system to determine the source of the endpointing problem, it was possible to forgo the proposed experiment, which translated into a cost savings in terms of wafers, tool time, and engineering analysis.

Case 3: Monitoring RF Parameters. On some of the etch tools used in the fab, the gas distribution plate and electrostatic chuck must be replaced periodically. They can be replaced either at specific time intervals or, using RTPC data, when they are about to fail. Monitoring the electrostatic chuck leakage current, backside cooling flow, and clamp pressure provides an indication of when the chuck must be replaced; monitoring the TCP load cap position, wafer area pressure, and temperature indicates when the gas distribution plate must be changed. Performing maintenance as needed rather than periodically increases tool availability and reduces costs for spare parts while increasing confidence in the process itself.

Improving Yield Learning

The main premise behind using RTPC is that process variation is bad. The goal is to provide a consistent, stable platform for all processes. To that end, every wafer processed at the fab discussed here has a figure of merit calculated to indicate if it received processing within normal variations from the PRMs. The multivariate statistical technique known as project on latent structures (PLS) is used to correlate process variation to defects, in-line electrical tests, and yield measurements. PLS relates many independent variables to many dependent variables. Unlike regression, which relates many independent variables to one independent variable, PLS corrects for correlation among the various parameters and reveals to the user which ones are important to the model. It also reveals when incoming data do not fit the model, so that the results may be reviewed with the appropriate level of confidence. PLS analysis can be used to refine the process and lead to the generation of new PRMs. This multistep procedure, which can be repeated for all tools and processes, offers several opportunities for improving yield learning or time to target.

Process Technical Transfer. In addition to providing manufacturing with a new process recipe, the technical transfer team can provide a response model for the process. This PRM permits operators to react immediately if the process is not performing as it had in development and to make the necessary adjustments. When the tools used for development and manufacturing differ from each other, the PRM can at least provide some approximate targets and simplify the transition.

Tool Matching. One-of-a-kind tools are not prevalent in semiconductor manufacturing. The different components, maintenance, and loading of individual tools can cause tool mismatches that contribute to process variation. Using a consistent PRM across multiple tools can highlight sources of variation, assist in tool matching, and promote consistent processing.

Process Performance Tracking. Collecting raw process data and calculating figures of merit provides other important advantages. Such data provide process engineers with accurate information about which wafers went through which tools and tool chambers and about the process conditions. This information, in turn, makes it possible to accurately identify tool slides immediately after the process is completed. It also aids engineers in correlating anomalies at test to specific process variations so that sporadic events can be monitored and investigated, enabling the engineers to trace the source of large anomalies quickly.

Line/Module/Lot Figures of Merit. Figure 6 shows how figures of merit can be used to track tool performance, regardless of process or product. Because the statistical limit for all RTPC charts is 1, the average of 1.7 for tool 206V indicates that it operated outside statistical limits at least 50% of the time. The other tools included on this chart all have averages of less than 0.4, which shows that they were operating consistently.

 
Figure 6: Average figures of merit by tool number. Since the statistical control limit is always 1, the percentages of out-of-control operations can be calculated easily.

Another way to track process performance across various process tools is illustrated in Figure 7, which plots the performance of a polysilicon etch process on multiple tools. In this example, the lower line represents parameters that are fixed by the operator, typically recipe settings, while the upper one represents parameters that vary unexpectedly (pressure variations, RF tuning, etc.). This type of plot can be very helpful for tool matching.

Figure 7: Average figures of merit for two distinct multivariate statistics plotted by tool number.

Figures of merit can also be used to track process performance by individual or group operation codes, as shown in Figure 8. Three of the operation codes in this figure exhibit very large amounts of variation. When such results occur, the next step would be to investigate the lots that went through these levels and look for anomalies. Plots such as this make it possible to summarize data from any combination of processes and monitor their performance. For example, an engineer might take all of the operation codes for gate conductor processing and monitor them as a function of speed sort for logic products.

Figure 8: Average multivariate figures of merit plotted by operation code.

Finally, Figure 9 shows the monitoring of figures of merit by product lot, which helps to predict whether the various lots will succeed at final test. If a lot has a high variance, such as those of the first seven lots in this chart, it is less likely to meet customer requirements than ones with a low variance. For low-volume parts, such a chart could be used to determine whether it is advisable to scrap a lot and start a new one.

Figure 9: Average multivariate figures of merit plotted by lot number.

Conclusion

By reducing process variation, implementation of RTPC at IBM-Burlington has decreased manufacturing costs through reduced scrap and increased tool availability. The RTPC system collects data from process tools and their auxiliary sensors, compares those data to process response models, and sends the resulting calculated figures of merit back to on-line SPC systems, which then immediately shut down tools that are out of control. The technique can also improve yield learning and characterize how individual process steps, or process perturbations, affect final electrical test yields and device performance. This information, in turn, can be used to optimize processes and process response models to minimize yield impacts and maximize device performance.

Acknowledgments

The author would like to thank John Colt, Valerie Congdon, Nancy Pascoe, Gregg Reynolds, Dan Snider, and David Wortheim for their contributions to this article.


Raymond J. Bunkofske, PhD, is an advisory engineer/scientist at IBM Microelectronic's plant in Essex Junction, VT, where he is responsible for the development and installation of a fabwide real-time process control system and for the development of in situ monitoring techniques and standards for process monitoring infrastructure. He also has eight years of experience in the design and certification of Class 1 cleanrooms and holds a patent for an advanced photoresist coater bowl design. Bunkofske received his BA in physics from Carleton College in Northfield, MN, his MS in mechanical engineering from the University of Illinois at Urbana-Champaign, and his PhD in mechanical engineering from the University of Minnesota in Minneapolis. (Bunkofske can be reached at 802/769-2808 or rbunkofs@us.ibm.com.)



MicroHome | Search | Current Issue | MicroArchives
Buyers Guide | Media Kit

Questions/comments about MICRO Magazine? E-mail us at cheynman@gmail.com.

© 2007 Tom Cheyney
All rights reserved.