Effective data sampling techniques for machine learning OPC in full chip production

Author(s):  
Hesham Abdelghany ◽  
Kevin Hooker
2019 ◽  
Vol 8 (4) ◽  
pp. 627
Author(s):  
Waleed Albattah ◽  
Saleh Albahli

Today, large volumes of data are actively generated on the order of terabytes or even petabytes. Hence, processing data on such a large scale in an efficient and effective manner is extremely challenging. However, existing research studies apply machine learning algorithms by loading the entire training dataset into the computer’s main memory (RAM). This causes a problem as the data grows too big over time and can’t be supported by most of the conventional models or hardware within a single machine’s memory. Inspired by current research studies, this paper discusses the benefits of implementing two sampling techniques that could be used for machine learning models: (1) sampling with replacement and (2) reservoir sampling. In this study, 40 experiments were performed by reducing the number of data instances by 50% of the original data using random sampling of a video dataset that was more than 40 GB in size. Remark that accuracies of SVM and random forest are very competitive classifiers and give the importance score of all repeated ten rounds of the process for each of the four combinations of sampling techniques and machine learning classifiers.  


2009 ◽  
Vol 19 (02) ◽  
pp. 67-89 ◽  
Author(s):  
M. A. H. AKHAND ◽  
MD. MONIRUL ISLAM ◽  
KAZUYUKI MURASE

Ensembles with several classifiers (such as neural networks or decision trees) are widely used to improve the generalization performance over a single classifier. Proper diversity among component classifiers is considered an important parameter for ensemble construction so that failure of one may be compensated by others. Among various approaches, data sampling, i.e., different data sets for different classifiers, is found more effective than other approaches. A number of ensemble methods have been proposed under the umbrella of data sampling in which some are constrained to neural networks or decision trees and others are commonly applicable to both types of classifiers. We studied prominent data sampling techniques for neural network ensembles, and then experimentally evaluated their effectiveness on a common test ground. Based on overlap and uncover, the relation between generalization and diversity is presented. Eight ensemble methods were tested on 30 benchmark classification problems. We found that bagging and boosting, the pioneer ensemble methods, are still better than most of the other proposed methods. However, negative correlation learning that implicitly encourages different networks to different training spaces is shown as better or at least comparable to bagging and boosting that explicitly create different training spaces.


2021 ◽  
Vol 18 (2) ◽  
pp. 61-71
Author(s):  
Soemaryatmi Soemaryatmi ◽  
Mukhlas Alkaf Mukhlas Alkaf ◽  
Suharji Suharji ◽  
Supriyanto Supriyanto

ABSTRAK Tari Angguk Warga Setuju  merupakan tari yang bertemakan ke Islaman yang digunakan untuk ritual bersih Desa Bandungrejo. Tujuan penelitian adalah untuk mendeskripsikan pelaksanaan pertunjukan Tari Angguk yang digunakan untuk kegiatan ritual desa setempat.Penelitian Tari Angguk menggunakan metode kualitatif, seluruh data yang diambil berupa  kegiatan seperti adat istiadat, pendukung pertunjukan. Tekhnik pengumpulan data menggunakan prosedur observasi, wawancara, dan dokumentasi. Tekhnik analisis data menggunakan analisis bentuk fungsi dan makna.Hasil penelitian  yang diperoleh Pertama, Masyarakat  Desa Bandungrejo,  secara umum merupakan masyarakat tradisional yang masih dipengaruhi nilai-nilai tradisi leluhurnya. Masyarakat  sebagian besar  menganut agama Islam akan tetapi sisa-sisa kepercayaan animisme, dinamisme  dan totemisme yang berbaur kepercayaan Hindu dan Budha masih terasa, hal ini tercermin pada sesaji dan doa-doa yang disajikan.  Aktivitas dalam upacara merupakan suatu kebiasaan yang dilakukan secara adat yang didasari oleh ajaran-ajaran para leluhur untuk mencapai tingkat selamat.Kedua, Para pelaku Tari Angguk dan penonton menjadi bagian integral seni pertunjukan ritual dan bukan nilai estetis yang akan dicapai tetapi berupa nilai religius yang ditujukan kepada pencipta alam agar  dengan tari mendatangkan kedamaian, kesuburan tanah pertanian, dan kebahagian.  Tari Angguk  selalu dikaitan dengan kekuatan magis simpatetis  sehingga menarik minat penonton. Gerak tari bersifat energik, dengan iringan vokal berisi doa-doa dan musik terbangan. Keyword: angguk, ritual, tari, bersih desa. ABSTRACT Angguk Warga Setuju Dance is a dance with Islamic theme for ritual bersih desa at Bandungrejo Village. This research proposed to describe how to perform Angguk Warga Setuju Dance as ritual activity at local.The research about Angguk Dance is conducted by qualitatively method, which is collection data included all activities such as customs and supporting performance. Data sampling techniques are observation, interview, and documentation. This research uses analysis for function and meaning as its technique of data analysis.Research results shows: Firstly, generally, people of Bandungrejo Village are traditional societies which still affected by their ancestor’s tradition value. Most of them are belief in Islamic. However, there is remaining belief from animism, dynamism, and totemism that blend in Hindhu and Budha belief that still feel; it reflected in sesaji and prayer presented. Activities within a ceremony have been customary habits based on ancestor’s belief to get safety.Secondly, performers and audiences of Angguk Dance are integral part of ritual performance; which is not to reach aesthetic value but religious just for Creator, at mean, a dance will result in peace, field fertility, and happiness. Angguk Dance always in relation with magical sympathetic strength, so it attracts audiences’ interest. Dance movements are energetic and accompanied by vocal included prayers and terbangan music. Keywords: angguk, ritual, dance, bersih desa


2022 ◽  
Vol 2161 (1) ◽  
pp. 012072
Author(s):  
Konduri Praveen Mahesh ◽  
Shaik Ashar Afrouz ◽  
Anu Shaju Areeckal

Abstract Every year there is an increasing loss of a huge amount of money due to fraudulent credit card transactions. Recently there is a focus on using machine learning algorithms to identify fraud transactions. The number of fraud cases to non-fraud transactions is very low. This creates a skewed or unbalanced data, which poses a challenge to training the machine learning models. The availability of a public dataset for this research problem is scarce. The dataset used for this work is obtained from Kaggle. In this paper, we explore different sampling techniques such as under-sampling, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE-Tomek, to work on the unbalanced data. Classification models, such as k-Nearest Neighbour (KNN), logistic regression, random forest and Support Vector Machine (SVM), are trained on the sampled data to detect fraudulent credit card transactions. The performance of the various machine learning approaches are evaluated for its precision, recall and F1-score. The classification results obtained is promising and can be used for credit card fraud detection.


2021 ◽  
Author(s):  
Kiran Aftab ◽  
Hafiza Sundus Fatima ◽  
Namrah Aziz ◽  
Erum Baig ◽  
Muhammad Khurram ◽  
...  

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259972
Author(s):  
Christopher Barrie ◽  
Arun Frey

Who goes to protests? To answer this question, existing research has relied either on retrospective surveys of populations or in-protest surveys of participants. Both techniques are prohibitively costly and face logistical and methodological constraints. In this article, we investigate the possibility of surveying protests using Twitter. We propose two techniques for sampling protestors on the ground from digital traces and estimate the demographic and ideological composition of ten protestor crowds using multidimensional scaling and machine-learning techniques. We test the accuracy of our estimates by comparing to two in-protest surveys from the 2017 Women’s March in Washington, D.C. Results show that our Twitter sampling techniques are superior to hashtag sampling alone. They also approximate the ideology and gender distributions derived from on-the-ground surveys, albeit with some bias, but fail to retrieve accurate age group estimates. We conclude that online samples are yet unable to provide reliable representative samples of offline protest.


Sign in / Sign up

Export Citation Format

Share Document