Process Equipment Control
Using real-time process control to enhance performance and improve
yield learning
Raymond J. Bunkofske, IBM
Process variations can be detected after a single run
by using multivariate statistics to analyze raw process data.
Because a consistent process is much easier to tune than an inconsistent
one, the stability of semiconductor manufacturing processes is critical
to achieving high device yields. There are many ways to measure process
performance. Unfortunately, since the one that really matters--yield--is
often many days and wafers away from the actual process run, it is important
to implement an intermediate measure to minimize the mean time to detect
(MTTD) of any process variations. In-line methods such as photo-limited
yield, scanning electron microscopy, and kerf electrical measurements
can be useful, but they have an MTTD on the order of days or weeks. Although
they may indicate that a problem exists, they do not provide sufficient
insight into why. After a defect excursion is discovered through these
methods, a process engineer must then examine the tool or maintenance
logs or conduct experiments to determine its root cause.
Real-time process control (RTPC) offers a way to decrease the
MTTD to one run of a process tool. It also gives a fab's mechanical engineering
staff a way to quickly access second-by-second process-related data for
either the wafer currently being run or one that ran six months ago. Based
on experiences with RTPC at the IBM-Burlington facility (Essex Junction,
VT), this article discusses how this technique can improve process consistency
and yield learning.
RTPC is achieved by using data acquisition software to collect
data from a process tool and any auxiliary sensors. These data are then
analyzed using multivariate statistics to detect any differences between
the response of the process and a historically derived response model,
which provides a fingerprint for the process. Calculated using responses
gathered across many wafers, this model describes a region in n-space
that results in acceptable process performance. If the process response
data match the model reasonably well, the tool continues processing wafers.
If not, a Pareto analysis of the likely sources of the differences is
generated to simplify the identification and correction of the problem.
Implementing the RTPC System
The RTPC system under investigation operates as shown in Figure
1. After the data acquisition system collects process-related data from
the tool and its related sensors, the data are synchronized, subjected
to any required signal processing, and summarized into conventional statistics
(mean, standard deviation, etc.). A Hotelling T2 figure of
merit (FOM) is then calculated for carefully chosen groups of statistics.
The data are saved twice during the data acquisition and summarization
process: First, the raw time-series data from the process equipment are
saved for possible use later, and second, the summary statistics are saved
in an SAS dataset for access through the manufacturing information warehouse.
 |
| Figure 1: Overall outline of information flow in
the RTPC system. |
While the data acquisition process is straightforward, the analyses usually
are not. There are tens or hundreds of parameters available, and determining
which of these to monitor is a challenge. The two multivariate statistical
techniques currently used to assist in the analysis are principal component
analysis and partial least squares. The former concentrates on determining
which parameters are responsible for variations within the independent
variables, while the latter aids in determining which independent variables
are driving variations in such dependent parameters as in-line electrical
test parameters and functional yield. Once parameters have been selected
for analysis, a process response model (PRM) is generated. Using historical
data, univariate means and standard deviations are calculated to scale
the incoming data to a mean of zero and a standard deviation of one. This
is done in two passes in order to eliminate outliers that might unduly
influence the PRM. The flow of the analysis can be summarized as follows:
1. The time-series data are normalized through various signal-processing
techniques such as logarithmic transformations and high-pass-frequency
filters.
2. The normalized data are summarized, or collapsed in time, by
finding averages and standard deviations.
3. A matrix of means and standard deviations is calculated for
each parameter of the collapsed data.
4. Any points more than three standard deviations from the mean
are eliminated.
5. Another matrix of means and standard deviations is calculated.
6. A covariance matrix is calculated for use during the multivariate
calculations to correct for any cross-correlation in the data.
7. Finally, a figure of merit--the Hotelling T2 statistic--is
calculated for each group of parameters defined by the user.
The Hotelling T2 statistic is calculated using the
equation
where xi is a particular summary statistic,
is the mean of that summary statistic, and 1
is the inverse covariance matrix. Most easily understood as the "distance
from center" of the normal operating space of the process, the Hotelling
T2 scales and centers the data and accounts for any correlation.
Once the Hotelling T2 figure of merit is calculated,
it is normalized by its statistically derived 2
limit:
This normalization allows all the data from a single process chamber,
regardless of product or recipe, to be plotted on a single statistical
process control (SPC) chart. It also means that the statistical limit
for every RTPC multivariate chart on the manufacturing line is 1.0, making
it relatively easy to compare performance across process tools or recipes.
At this point in the analysis process, the results are sent to the SPC
system, which decides if the process run was out of control and, if so,
issues a command to stop the tool from further processing.
Although the Hotelling T2 statistic provides a robust
means of detecting a process variation, it provides no insight into which
parameters are responsible for the variation and how much each parameter
contributes to the figure of merit. To solve this problem, a software
algorithm was developed and a Pareto for out-of-control points was included
in the on-line SPC charts. The operator simply presses a function key
to bring up the Pareto. A typical multivariate control chart with an upper
control limit of 1 and an average value around 0.5 is shown in Figure
2. There are two points on this chart above the control limit. To determine
which process parameters are responsible for these out-of-control conditions,
the operator presses the appropriate function key and a Pareto, such as
that in Figure 3, is displayed. The logistics information is shown at
the top and a list of the offending parameters below. In this example,
the range of the ozone mass-flow controller (MFC) and the standard deviation
of the chamber pressure are the most outstanding contributors. Armed with
this information, maintenance technicians can fix the problems expeditiously.
In addition, because the technicians know immediately what to fix, maintenance
driven by out-of-control events can be minimized. At the fab under investigation,
it has been reduced by 1015%.
 |
| Figure 2: Typical multivariate control chart. |
|
--APPENDED BY ETBSIDE ON 04-14-99 AT 01:05:05--
|
|
LOT XOU16001JX, SLOT 20, WAFER: IJX3920, RCP:
A_4.5B 4.2P 10K
|
|
CONTRIBUTION (%)
|
PARAMETER
|
|
56.75
|
RANGE OZONE MFC |
|
31.44
|
STANDARD DEVIATION
CHAMBER PRESSURE |
|
7.42
|
RANGE OF TEOS MFC |
|
4.34
|
STD. SUSCEPTOR TEMPERATURE |
|
0.02
|
RANGE TEB1 MFC |
|
0.02
|
RANGE He-HIGH MFC |
| Figure 3: Example of an out-of-control Pareto displayed by an
on-line SPC system. |
Before this multivariate analysis method was adopted, the fab tried to
implement SPC charts for individual process parameters. However, that
practice posed several difficulties: the number of SPC charts was unwieldy,
the lack of cross-correlation severely restricted the robustness of the
system, control limits were not kept up-to-date, and making comparisons
across recipes and tools was difficult because of their varying set points
and normal process variation.
The problem of having too many SPC charts can be illustrated by
considering the case of a single process tool. A particular oxide etcher
has 4 chambers, 8 frequently used recipes per chamber, and 15 parameters
per recipe that must be monitored. Performing conventional univariate
SPC analysis would require 480 SPC charts (4 x 8 x 15). Monitoring so
many charts is not practical. Using a multivariate process response model,
the same tool can be monitored with four charts, one per chamber, with
one point on each chart for each wafer/recipe step processed.
It is easy to see why using multivariate methods makes monitoring
and control easier, but why is this technique more robust than univariate
analysis? When multiple parameters are monitored through univariate analysis,
it is necessary to correct for two factors--the number of parameters and
cross-correlation--because even with independent variables the "in-control"
region for the parameters taken jointly is not the union of the separate
regions, but rather something smaller. This concept is illustrated in
Figure 4, where the dashed-rule box represents traditional SPC limits.
When the existence of two simultaneous variables is taken into account,
however, the actual in-control region is that within the oval. Although
points inside the rectangle but outside the oval are actually out of control,
a univariate SPC system would not interpret them as such, yielding false
positives. The limits could be tightened and the SPC-limit rectangle could
be placed inside the oval, but then any points outside the rectangle and
inside the oval would be false negatives.
A simple formula for estimating the probability of a false call is
Probability of a false call = 1 (1 )n
 |
| Figure 4: Chart showing univariate (rectangular region)
and joint (oval region) probability control limits. |
where n is the number of parameters monitored and
is the chosen significance level, usually 0.05 or 0.01. The upper line
in Figure 5, representing a significance level of 0.05, indicates that
the probability of a false negative, which would lead to a tool shutdown
when nothing is wrong, approaches 40% when 10 parameters are being monitored
using univariate analysis. The likelihood of a false call rises with an
increase in the number of monitored parameters. The multivariate method
is more robust than the univariate method because it accounts for many
parameters at once and corrects for any cross-correlation, thus helping
to prevent false interpretations and to minimize downtime.
 |
| Figure 5: Probability of a false negative as a function
of the number of monitored parameters. |
The RTPC System in Practice
The RTPC system has been in use at IBM-Burlington for approximately
four years and is currently used for monitoring more than 275 process
tools. In this period more than $10 million have been saved on a total
investment of less than $3 million. The following case studies highlight
the successes of this system.
Case 1: RF Generator Connected to a Dummy Load. In this case,
when maintenance was being performed on the radio-frequency (RF) subsystem
of an oxide etch tool, the technician failed to inform the next shift
that the RF generator was plugged into a dummy load for calibration. As
a result, manufacturing operators started processing wafers on the tool.
Because it was a timed process, there were no endpoint signals to indicate
that something was amiss. Moreover, because the RF generator carried a
perfect 50- load, no alarm was
triggered. The RTPC system, however, detected that several parameters,
such as reflected RF power and dc bias, did not match the PRM. Based on
these data, the tool was stopped after one lot, and only that lot had
to be scrapped. If RTPC had not been available, the next inspection would
have occurred only after 20 lots had been processed. Therefore, RTPC was
responsible for saving 19 lots.
Case 2: Loose Throttle-Valve Grounding Wire. In the second case,
one or two wafers per lot at contact etch were not endpointing correctly.
The problem was intermittent and the endpoint traces provided no real
clues as to what was going awry. Therefore, an experiment was proposed
in which the process would be run under differing conditions to determine
the source of the problem. At that time the RTPC system was just being
implemented and no real controls were in place. However, the data were
available and it was decided to analyze them before undertaking the costly
experiment.
The ensuing analysis showed a sinusoidal variation in pressure
at 0.5 Hz, which was significant because this particular process uses
the magnetic field, which rotates at 0.5 Hz. When the tool was inspected,
it was discovered that the grounding wire on the throttle valve had an
intermittent open. The wire was acting like an antenna, picking up the
signal from the magnetic field coils, and oscillating. This in turn caused
pressure perturbations that prevented some wafers from endpointing properly.
By using the RTPC system to determine the source of the endpointing problem,
it was possible to forgo the proposed experiment, which translated into
a cost savings in terms of wafers, tool time, and engineering analysis.
Case 3: Monitoring RF Parameters. On some of the etch tools used
in the fab, the gas distribution plate and electrostatic chuck must be
replaced periodically. They can be replaced either at specific time intervals
or, using RTPC data, when they are about to fail. Monitoring the electrostatic
chuck leakage current, backside cooling flow, and clamp pressure provides
an indication of when the chuck must be replaced; monitoring the TCP load
cap position, wafer area pressure, and temperature indicates when the
gas distribution plate must be changed. Performing maintenance as needed
rather than periodically increases tool availability and reduces costs
for spare parts while increasing confidence in the process itself.
Improving Yield Learning
The main premise behind using RTPC is that process variation is
bad. The goal is to provide a consistent, stable platform for all processes.
To that end, every wafer processed at the fab discussed here has a figure
of merit calculated to indicate if it received processing within normal
variations from the PRMs. The multivariate statistical technique known
as project on latent structures (PLS) is used to correlate process variation
to defects, in-line electrical tests, and yield measurements. PLS relates
many independent variables to many dependent variables. Unlike regression,
which relates many independent variables to one independent variable,
PLS corrects for correlation among the various parameters and reveals
to the user which ones are important to the model. It also reveals when
incoming data do not fit the model, so that the results may be reviewed
with the appropriate level of confidence. PLS analysis can be used to
refine the process and lead to the generation of new PRMs. This multistep
procedure, which can be repeated for all tools and processes, offers several
opportunities for improving yield learning or time to target.
Process Technical Transfer. In addition to providing manufacturing
with a new process recipe, the technical transfer team can provide a response
model for the process. This PRM permits operators to react immediately
if the process is not performing as it had in development and to make
the necessary adjustments. When the tools used for development and manufacturing
differ from each other, the PRM can at least provide some approximate
targets and simplify the transition.
Tool Matching. One-of-a-kind tools are not prevalent in semiconductor
manufacturing. The different components, maintenance, and loading of individual
tools can cause tool mismatches that contribute to process variation.
Using a consistent PRM across multiple tools can highlight sources of
variation, assist in tool matching, and promote consistent processing.
Process Performance Tracking. Collecting raw process data and
calculating figures of merit provides other important advantages. Such
data provide process engineers with accurate information about which wafers
went through which tools and tool chambers and about the process conditions.
This information, in turn, makes it possible to accurately identify tool
slides immediately after the process is completed. It also aids engineers
in correlating anomalies at test to specific process variations so that
sporadic events can be monitored and investigated, enabling the engineers
to trace the source of large anomalies quickly.
Line/Module/Lot Figures of Merit. Figure 6 shows how figures
of merit can be used to track tool performance, regardless of process
or product. Because the statistical limit for all RTPC charts is 1, the
average of 1.7 for tool 206V indicates that it operated outside statistical
limits at least 50% of the time. The other tools included on this chart
all have averages of less than 0.4, which shows that they were operating
consistently.
 |
| Figure 6: Average figures of merit by tool number.
Since the statistical control limit is always 1, the percentages of
out-of-control operations can be calculated easily. |
Another way to track process performance across various process tools
is illustrated in Figure 7, which plots the performance of a polysilicon
etch process on multiple tools. In this example, the lower line represents
parameters that are fixed by the operator, typically recipe settings,
while the upper one represents parameters that vary unexpectedly (pressure
variations, RF tuning, etc.). This type of plot can be very helpful for
tool matching.
 |
| Figure 7: Average figures of merit for two distinct
multivariate statistics plotted by tool number. |
Figures of merit can also be used to track process performance by individual
or group operation codes, as shown in Figure 8. Three of the operation
codes in this figure exhibit very large amounts of variation. When such
results occur, the next step would be to investigate the lots that went
through these levels and look for anomalies. Plots such as this make it
possible to summarize data from any combination of processes and monitor
their performance. For example, an engineer might take all of the operation
codes for gate conductor processing and monitor them as a function of
speed sort for logic products.
 |
| Figure 8: Average multivariate figures of merit plotted
by operation code. |
Finally, Figure 9 shows the monitoring of figures of merit by product
lot, which helps to predict whether the various lots will succeed at final
test. If a lot has a high variance, such as those of the first seven lots
in this chart, it is less likely to meet customer requirements than ones
with a low variance. For low-volume parts, such a chart could be used
to determine whether it is advisable to scrap a lot and start a new one.
 |
| Figure 9: Average multivariate figures of merit plotted
by lot number. |
Conclusion
By reducing process variation, implementation of RTPC at IBM-Burlington
has decreased manufacturing costs through reduced scrap and increased
tool availability. The RTPC system collects data from process tools and
their auxiliary sensors, compares those data to process response models,
and sends the resulting calculated figures of merit back to on-line SPC
systems, which then immediately shut down tools that are out of control.
The technique can also improve yield learning and characterize how individual
process steps, or process perturbations, affect final electrical test
yields and device performance. This information, in turn, can be used
to optimize processes and process response models to minimize yield impacts
and maximize device performance.
Acknowledgments
The author would like to thank John Colt, Valerie Congdon, Nancy
Pascoe, Gregg Reynolds, Dan Snider, and David Wortheim for their contributions
to this article.
Raymond J. Bunkofske, PhD, is an advisory engineer/scientist
at IBM Microelectronic's plant in Essex Junction, VT, where he is responsible
for the development and installation of a fabwide real-time process control
system and for the development of in situ monitoring techniques and standards
for process monitoring infrastructure. He also has eight years of experience
in the design and certification of Class 1 cleanrooms and holds a patent
for an advanced photoresist coater bowl design. Bunkofske received his
BA in physics from Carleton College in Northfield, MN, his MS in mechanical
engineering from the University of Illinois at Urbana-Champaign, and his
PhD in mechanical engineering from the University of Minnesota in Minneapolis.
(Bunkofske can be reached at 802/769-2808 or rbunkofs@us.ibm.com.)

MicroHome |
Search | Current Issue | MicroArchives
Buyers Guide | Media Kit
Questions/comments about MICRO Magazine? E-mail us at cheynman@gmail.com.
© 2007 Tom Cheyney
All rights reserved.
|