Effect of Irrelevant Variables on Faulty Wafer Detection in Semiconductor Manufacturing

Machine learning has been applied successfully for faulty wafer detection tasks in semiconductor manufacturing. For the tasks, prediction models are built with prior data to predict the quality of future wafers as a function of their precedent process parameters and measurements. In real-world problems, it is common for the data to have a portion of input variables that are irrelevant to the prediction of an output variable. The inclusion of many irrelevant variables negatively affects the performance of prediction models. Typically, prediction models learned by different learning algorithms exhibit different sensitivities with regard to irrelevant variables. Algorithms with low sensitivities are preferred as a first trial for building prediction models, whereas a variable selection procedure is necessarily considered for highly sensitive algorithms. In this study, we investigate the effect of irrelevant variables on three well-known representative learning algorithms that can be applied to both classification and regression tasks: artificial neural network, decision tree (DT), and k-nearest neighbors (k-NN). We analyze the characteristics of these learning algorithms in the presence of irrelevant variables with different model complexity settings. An empirical analysis is performed using real-world datasets collected from a semiconductor manufacturer to examine how the number of irrelevant variables affects the behavior of prediction models trained with different learning algorithms and model complexity settings. The results indicate that the prediction accuracy of k-NN is highly degraded, whereas DT demonstrates the highest robustness in the presence of many irrelevant variables. In addition, a higher model complexity of learning algorithms leads to a higher sensitivity to irrelevant variables.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques

Mathematics ◽

10.3390/math8060879 ◽

2020 ◽

Vol 8 (6) ◽

pp. 879 ◽

Cited By ~ 1

Author(s):

Luis Castro-Martín ◽

Maria del Mar Rueda ◽

Ramón Ferri-García

Keyword(s):

Propensity Score ◽

Linear Models ◽

Prediction Models ◽

Selection Procedure ◽

Bias Reduction ◽

Online Surveys ◽

K Nearest Neighbors ◽

Weighted Estimates ◽

Statistical Matching ◽

Propensity Score Adjustment

Online surveys are increasingly common in social and health studies, as they provide fast and inexpensive results in comparison to traditional ones. However, these surveys often work with biased samples, as the data collection is often non-probabilistic because of the lack of internet coverage in certain population groups and the self-selection procedure that many online surveys rely on. Some procedures have been proposed to mitigate the bias, such as propensity score adjustment (PSA) and statistical matching. In PSA, propensity to participate in a nonprobability survey is estimated using a probability reference survey, and then used to obtain weighted estimates. In statistical matching, the nonprobability sample is used to train models to predict the values of the target variable, and the predictions of the models for the probability sample can be used to estimate population values. In this study, both methods are compared using three datasets to simulate pseudopopulations from which nonprobability and probability samples are drawn and used to estimate population parameters. In addition, the study compares the use of linear models and Machine Learning prediction algorithms in propensity estimation in PSA and predictive modeling in Statistical Matching. The results show that statistical matching outperforms PSA in terms of bias reduction and Root Mean Square Error (RMSE), and that simpler prediction models, such as linear and k-Nearest Neighbors, provide better outcomes than bagging algorithms.

Download Full-text

Identification of Micro-blog Opinion Leaders based on User Features and Outbreak Nodes

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v12i01.6139 ◽

2017 ◽

Vol 12 (01) ◽

pp. 141

Author(s):

Lin Cui ◽

Dechang Pi

Keyword(s):

Real World ◽

Latent Variable ◽

Probability Model ◽

Opinion Leader ◽

Experimental Results ◽

Opinion Leaders ◽

Proposed Model ◽

Input Variables ◽

Real World Datasets ◽

The Ideal

At present, recognition of micro-blog opinion leaders mainly depends on the number of users posting micro-blogs, registration time, the number of good friends and other static attributes. However, it is very difficult to obtain the ideal recognition results through the above mentioned methods. This paper puts forward a new method that identifies the opinion leaders according to the change of user features and outbreak nodes. Deeply analyzing various attributes and behaviors of users, on the basis of user features and outbreak nodes, user’s attribute features are regarded as the input variables, behavior features of the user and outbreak nodes are regarded as observed variables. The probability as an opinion leader is the latent variable between input variables and observation variables, and the constructed probability model is used to recognize micro-blog opinion leaders. Experiments are carried out on the two real-world datasets from Sina micro-blog and Twitter, and the comparative experimental results show that the proposed model can more precisely find the micro-blog opinion leaders.

Download Full-text

Efficient and Scalable Multi-Task Regression on Massive Number of Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013763 ◽

2019 ◽

Vol 33 ◽

pp. 3763-3770

Author(s):

Xiao He ◽

Francesco Alesiani ◽

Ammar Shaker

Keyword(s):

Real World ◽

Large Scale ◽

Nearest Neighbor ◽

Prediction Models ◽

Optimization Method ◽

K Nearest Neighbor ◽

Task Learning ◽

Massive Number ◽

Real World Datasets ◽

Convex Clustering

Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we propose a novel algorithm, named Convex Clustering Multi-Task regression Learning (CCMTL), which integrates with convex clustering on the k-nearest neighbor graph of the prediction models. Further, CCMTL efficiently solves the underlying convex problem with a newly proposed optimization method. CCMTL is accurate, efficient to train, and empirically scales linearly in the number of tasks. On both synthetic and real-world datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA) multi-task learning methods in terms of prediction accuracy as well as computational efficiency. On a real-world retail dataset with 23,812 tasks, CCMTL requires only around 30 seconds to train on a single thread, while the SoA methods need up to hours or even days.

Download Full-text

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors With Adaptive Merging Strategy

10.21203/rs.3.rs-95747/v1 ◽

2020 ◽

Author(s):

Xiaoning Yuan ◽

Hang Yu ◽

Jun Liang ◽

Bing Xu

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Real World Datasets ◽

Merging Strategy ◽

Selection Of

Abstract Recently the density peaks clustering algorithm (dubbed as DPC) attracts lots of attention. The DPC is able to quickly find cluster centers and complete clustering tasks. And the DPC is suitable for many clustering tasks. However, the cutoff distance 𝑑𝑑𝑐𝑐 is depends on human experience which will greatly affect the clustering results. In addition, the selection of cluster centers requires manual participation which will affect the clustering efficiency. In order to solve these problem, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (dubbed as KNN-ADPC). We propose a clusters merging strategy to automatically aggregate the over-segmented clusters. Additionally, the K nearest neighbors is adopted to divide points more reasonably. The KNN-ADPC only has one parameter and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove the higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC and DPC-KNN.

Download Full-text

An Attention-Based Multilayer GRU Model for Multistep-Ahead Short-Term Load Forecasting

Sensors ◽

10.3390/s21051639 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1639

Author(s):

Seungmin Jung ◽

Jihoon Moon ◽

Sungwoo Park ◽

Eenjun Hwang

Keyword(s):

Power Consumption ◽

Prediction Models ◽

Short Term Memory ◽

Load Forecasting ◽

Input Sequence ◽

Short Term ◽

Performance Improvements ◽

Short Term Load Forecasting ◽

Significant Performance ◽

Input Variables

Recently, multistep-ahead prediction has attracted much attention in electric load forecasting because it can deal with sudden changes in power consumption caused by various events such as fire and heat wave for a day from the present time. On the other hand, recurrent neural networks (RNNs), including long short-term memory and gated recurrent unit (GRU) networks, can reflect the previous point well to predict the current point. Due to this property, they have been widely used for multistep-ahead prediction. The GRU model is simple and easy to implement; however, its prediction performance is limited because it considers all input variables equally. In this paper, we propose a short-term load forecasting model using an attention based GRU to focus more on the crucial variables and demonstrate that this can achieve significant performance improvements, especially when the input sequence of RNN is long. Through extensive experiments, we show that the proposed model outperforms other recent multistep-ahead prediction models in the building-level power consumption forecasting.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Machine Learning Model of Dimensionless Numbers to Predict Flow Patterns and Droplet Characteristics for Two-Phase Digital Flows

Applied Sciences ◽

10.3390/app11094251 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4251

Author(s):

Jinsong Zhang ◽

Shuai Zhang ◽

Jianhua Zhang ◽

Zhiliang Wang

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Digital Microfluidics ◽

Flow Patterns ◽

Machine Learning Algorithms ◽

Dimensionless Numbers ◽

Two Phase ◽

The Difference ◽

Input Variables ◽

Digital Microfluidic

In the digital microfluidic experiments, the droplet characteristics and flow patterns are generally identified and predicted by the empirical methods, which are difficult to process a large amount of data mining. In addition, due to the existence of inevitable human invention, the inconsistent judgment standards make the comparison between different experiments cumbersome and almost impossible. In this paper, we tried to use machine learning to build algorithms that could automatically identify, judge, and predict flow patterns and droplet characteristics, so that the empirical judgment was transferred to be an intelligent process. The difference on the usual machine learning algorithms, a generalized variable system was introduced to describe the different geometry configurations of the digital microfluidics. Specifically, Buckingham’s theorem had been adopted to obtain multiple groups of dimensionless numbers as the input variables of machine learning algorithms. Through the verification of the algorithms, the SVM and BPNN algorithms had classified and predicted the different flow patterns and droplet characteristics (the length and frequency) successfully. By comparing with the primitive parameters system, the dimensionless numbers system was superior in the predictive capability. The traditional dimensionless numbers selected for the machine learning algorithms should have physical meanings strongly rather than mathematical meanings. The machine learning algorithms applying the dimensionless numbers had declined the dimensionality of the system and the amount of computation and not lose the information of primitive parameters.

Download Full-text

Modeling the Effect of Subcutaneous Lixisenatide on Glucoregulatory Endocrine Secretions and Gastric Emptying in Type 2 Diabetes to Simulate the Effect of iGlarLixi Administration Timing on Blood Sugar Profiles

Journal of Diabetes Science and Technology ◽

10.1177/19322968211015671 ◽

2021 ◽

pp. 193229682110156

Author(s):

Thibault Gautier ◽

Rupesh Silwal ◽

Aramesh Saremi ◽

Anders Boss ◽

Marc D. Breton

Keyword(s):

Type 2 Diabetes ◽

Blood Glucose ◽

Blood Sugar ◽

Blood Glucose Concentration ◽

Fixed Ratio ◽

Selection Procedure ◽

Model Complexity ◽

Evening Meal ◽

Administration Timing

Background: As type 2 diabetes (T2D) progresses, intensification to combination therapies, such as iGlarLixi (a fixed-ratio GLP-1 RA and basal insulin combination), may be required. Here a simulation study was used to assess the effect of iGlarLixi administration timing (am vs pm) on blood sugar profiles. Methods: Models of lixisenatide were built with a selection procedure, optimizing measurement fits and model complexity, and were included in a pre-existing T2D simulation platform containing glargine models. With the resulting tool, a simulated trial was conducted with 100 in-silico participants with T2D. Individuals were given iGLarLixi either before breakfast or before an evening meal for 2 weeks and daily glycemic profiles were analyzed. In the model, breakfast was considered the largest meal of the day. Results: A similar percentage of time within 24 hours was spent with blood sugar levels between 70 to 180 mg/dL when iGlarLixi was administered pre-breakfast or pre-evening meal (73% vs 71%, respectively). Overall percent of time with blood glucose levels above 180 mg/dL within a 24-hour period was similar when iGlarLixi was administered pre-breakfast or pre-evening meal (26% vs 28%, respectively). Rates of hypoglycemia were low in both regimens, with a blood glucose concentration of below 70 mg/dL only observed for 1% of the 24-hour time period for either timing of administration. Conclusions: Good efficacy was observed when iGlarlixi was administered pre-breakfast; however, administration of iGlarlixi pre-evening meal was also deemed to be effective, even though in the model the size of the evening meal was smaller than that of the breakfast.

Download Full-text