An allotment of H1B work visa in USA using machine learning

Pooja Thakur; Mandeep Singh; Harpreet Singh; Prashant Singh Rana

doi:10.14419/ijet.v7i2.27.12642

An allotment of H1B work visa in USA using machine learning

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.27.12642 ◽

2018 ◽

Vol 7 (2.27) ◽

pp. 93

Author(s):

Pooja Thakur ◽

Mandeep Singh ◽

Harpreet Singh ◽

Prashant Singh Rana

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Foreign Workers ◽

Classification Models ◽

Single Model ◽

Proposed Model ◽

The Status ◽

Accuracy Parameter ◽

Lottery System ◽

Fold Cross Validation

H1B work visas are utilized to contract profoundly talented outside specialists at low wages in America which help firms and impact U.S economy unfavorably. In excess of 100,000 individuals for every year apply tight clamp for higher examinations and also to work and number builds each year. Selections of foreigners are done by lottery system which doesn’t follow any full proofed method and so results cause a loophole between US-based and foreign workers. We endeavor to examine petitions filled from 2015 to 2017 with the goal that a superior prediction model need to develop using machine learning which helps to foresee the aftereffect of the request of ahead of time which shows whether an appeal to is commendable or not. In this work, we use seven classification models Decision tree, C5.0, Random Forest, Naïve Bayes, Neural Network and SVM which predict the status of a petition as certified, denied, withdrawal or certified with-drawls. The predictions of these models are checked on accuracy parameter. It is found that C5.0 outperform with the best accuracy of 94.62 as a single model but proposed model gives better results of 95.4 accuracies which is built by machine ensemble method and this is validated by 10 fold cross-validation.

Get full-text (via PubEx)

Analysis and Prediction of Instagram Users Popularity using Regression Techniques based on Metadata, Media and Hashtags Analysis

10.31219/osf.io/uezyk ◽

2020 ◽

Author(s):

Kristo Radion Purba ◽

David Asirvatham ◽

Raja Kumar Murugesan

Keyword(s):

Machine Learning ◽

Social Media ◽

Statistical Analysis ◽

Random Forest ◽

Regression Models ◽

Cross Validation ◽

The Past ◽

Proposed Model ◽

Regression Techniques ◽

Fold Cross Validation

In recent years, social media is growing at an unprecedented rate, and more people have become influencers. Understanding popularity helps ordinary users to boost popularity, and business users to choose better influencers. There were studies to predict the popularity of posted images on social media, but there was none on the user's popularity as a whole. Furthermore, existing studies have not taken hashtag analysis into consideration, one of the most useful social media feature. This research aims to create a model to predict a user's popularity, which is defined by a combination of engagement rate and followers growth. There were six machine learning regression models tested. The proposed model successfully predicted the users’ popularity, with R2 up to 0.852, using Random Forest with 10-fold cross-validation. The additional statistical analysis and features analysis results revealed factors that can boost popularity, such as actively posting and following users, completing user's metadata, and using 11 hashtags. In contrast, it was also found that having a large number of posts and following in the past will not help in growing popularity, as well as the use of popular hashtags.

Get full-text (via PubEx)

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Get full-text (via PubEx)

Traffic Flow Anomaly Detection Based on Robust Ridge Regression with Particle Swarm Optimization Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/3673085 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Mingzhu Tang ◽

Xiangwan Fu ◽

Huawei Wu ◽

Qi Huang ◽

Qi Zhao

Keyword(s):

Anomaly Detection ◽

Traffic Flow ◽

Ridge Regression ◽

Cross Validation ◽

Sliding Window ◽

Pso Algorithm ◽

Swarm Optimization ◽

Feature Sets ◽

Proposed Model ◽

Fold Cross Validation

Traffic flow anomaly detection is helpful to improve the efficiency and reliability of detecting fault behavior and the overall effectiveness of the traffic operation. The data detected by the traffic flow sensor contains a lot of noise due to equipment failure, environmental interference, and other factors. In the case of large traffic flow data noises, a traffic flow anomaly detection method based on robust ridge regression with particle swarm optimization (PSO) algorithm is proposed. Feature sets containing historical characteristics with a strong linear correlation and statistical characteristics using the optimal sliding window are constructed. Then by providing the feature sets inputs to the PSO-Huber-Ridge model and the model outputs the traffic flow. The Huber loss function is recommended to reduce noise interference in the traffic flow. The L2 regular term of the ridge regression is employed to reduce the degree of overfitting of the model training. A fitness function is constructed, which can balance the relative size between the k-fold cross-validation root mean square error and the k-fold cross-validation average absolute error with the control parameter η to improve the optimization efficiency of the optimization algorithm and the generalization ability of the proposed model. The hyperparameters of the robust ridge regression forecast model are optimized by the PSO algorithm to obtain the optimal hyperparameters. The traffic flow data set is used to train and validate the proposed model. Compared with other optimization methods, the proposed model has the lowest RMSE, MAE, and MAPE. Finally, the traffic flow that forecasted by the proposed model is used to perform anomaly detection. The abnormality of the error between the forecasted value and the actual value is detected by the abnormal traffic flow threshold based on the sliding window. The experimental results verify the validity of the proposed anomaly detection model.

Get full-text (via PubEx)

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Get full-text (via PubEx)

North American Hardwoods Identification Using Machine-Learning

Forests ◽

10.3390/f11030298 ◽

2020 ◽

Vol 11 (3) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Dercilio Junior Verly Lopes ◽

Greg W. Burgreen ◽

Edward D. Entsminger

Keyword(s):

Machine Learning ◽

North American ◽

Mobile Application ◽

Cross Validation ◽

Data Augmentation ◽

Technical Note ◽

Machine Learning Method ◽

Training Set ◽

Hardwood Species ◽

Fold Cross Validation

This technical note determines the feasibility of using an InceptionV4_ResNetV2 convolutional neural network (CNN) to correctly identify hardwood species from macroscopic images. The method is composed of a commodity smartphone fitted with a 14× macro lens for photography. The end-grains of ten different North American hardwood species were photographed to create a dataset of 1869 images. The stratified 5-fold cross-validation machine-learning method was used, in which the number of testing samples varied from 341 to 342. Data augmentation was performed on-the-fly for each training set by rotating, zooming, and flipping images. It was found that the CNN could correctly identify hardwood species based on macroscopic images of its end-grain with an adjusted accuracy of 92.60%. With the current growing of machine-learning field, this model can then be readily deployed in a mobile application for field wood identification.

Get full-text (via PubEx)

Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network

Applied Sciences ◽

10.3390/app10061999 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1999 ◽

Cited By ~ 7

Author(s):

Milica M. Badža ◽

Marko Č. Barjaktarović

Keyword(s):

Neural Network ◽

Machine Learning ◽

Brain Tumors ◽

Convolutional Neural Network ◽

Cross Validation ◽

Magnetic Resonance Images ◽

Generalization Capability ◽

Data Set ◽

Fold Cross Validation

The classification of brain tumors is performed by biopsy, which is not usually conducted before definitive brain surgery. The improvement of technology and machine learning can help radiologists in tumor diagnostics without invasive measures. A machine-learning algorithm that has achieved substantial results in image segmentation and classification is the convolutional neural network (CNN). We present a new CNN architecture for brain tumor classification of three tumor types. The developed network is simpler than already-existing pre-trained networks, and it was tested on T1-weighted contrast-enhanced magnetic resonance images. The performance of the network was evaluated using four approaches: combinations of two 10-fold cross-validation methods and two databases. The generalization capability of the network was tested with one of the 10-fold methods, subject-wise cross-validation, and the improvement was tested by using an augmented image database. The best result for the 10-fold cross-validation method was obtained for the record-wise cross-validation for the augmented data set, and, in that case, the accuracy was 96.56%. With good generalization capability and good execution speed, the new developed CNN architecture could be used as an effective decision-support tool for radiologists in medical diagnostics.

Get full-text (via PubEx)

Issues in performance evaluation for host–pathogen protein interaction prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500116 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1650011 ◽

Cited By ~ 9

Author(s):

Wajid Arshad Abbasi ◽

Fayyaz Ul Amir Afsar Minhas

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Cross Validation ◽

Protein Protein Interactions ◽

Evaluation Scheme ◽

Host Pathogen ◽

Pathogen Protein ◽

Protein Interaction Prediction ◽

Underlying Mechanisms ◽

Fold Cross Validation

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

Get full-text (via PubEx)

Analisis Sentimen Pada Maskapai Penerbangan di Platform Twitter Menggunakan Algoritma Support Vector Machine (SVM)

Teknika ◽

10.34148/teknika.v10i1.311 ◽

2021 ◽

Vol 10 (1) ◽

pp. 18-26

Author(s):

Hendry Cipta Husada ◽

Adi Suryaputra Paramita

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Cross Validation ◽

Support Vector ◽

Learning Approach ◽

Social Media Platform ◽

Machine Learning Approach ◽

Media Platform ◽

Fold Cross Validation

Perkembangan teknologi saat ini telah memberikan kemudahan bagi banyak orang dalam mendapatkan dan menyebarkan informasi di berbagai social media platform. Twitter merupakan salah satu media yang kerap digunakan untuk menyampaikan opini sebagai bentuk reaksi seseorang atas suatu hal. Opini yang terdapat di Twitter dapat digunakan perusahaan maskapai penerbangan sebagai parameter kunci untuk mengetahui tingkat kepuasan publik sekaligus bahan evaluasi bagi perusahaan. Berdasarkan hal tersebut, diperlukan sebuah metode yang dapat secara otomatis melakukan klasifikasi opini ke dalam kategori positif, negatif, atau netral melalui proses analisis sentimen. Proses analisis sentimen dilakukan dengan proses data preprocessing, pembobotan kata menggunakan metode TF-IDF, penerapan algoritma, dan pembahasan atas hasil klasifikasi. Klasifikasi opini dilakukan dengan machine learning approach memanfaatkan algoritma multi-class Support Vector Machine (SVM). Data yang digunakan dalam penelitian ini adalah opini dalam bahasa Inggris dari para pengguna Twitter terhadap maskapai penerbangan. Berdasarkan pengujian yang telah dilakukan, hasil klasifikasi terbaik diperoleh menggunakan SVM kernel RBF pada nilai parameter 𝐶(complexity) = 10 dan 𝛾(gamma) = 1, dengan nilai accuracy sebesar 84,37% dan 80,41% ketika menggunakan 10-fold cross validation.

Get full-text (via PubEx)

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>

Get full-text (via PubEx)

Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy Using a Multiparametric MRI-Based Machine Learning Model in Patients With Breast Cancer

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2021.662749 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yuhong Huang ◽

Wenben Chen ◽

Xiaoling Zhang ◽

Shaofu He ◽

Nan Shao ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Learning Model ◽

Training Dataset ◽

Tumor Shrinkage ◽

Clinicopathologic Characteristics ◽

Testing Dataset ◽

Machine Learning Model ◽

Fold Cross Validation

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.

Get full-text (via PubEx)