A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Download Full-text

Predicting adult neuroscience intensive care unit admission from emergency department triage using a retrospective, tabular-free text machine learning approach

Scientific Reports ◽

10.1038/s41598-021-80985-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Eyal Klang ◽

Benjamin R. Kummer ◽

Neha S. Dangayach ◽

Amy Zhong ◽

M. Arash Kia ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Emergency Department ◽

Intensive Care ◽

Learning Model ◽

Free Text ◽

Combined Model ◽

Text Data ◽

Machine Learning Model ◽

Record Data

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.

Download Full-text

Detection and defense of cyberattacks on the machine learning control of robotic systems

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/15485129211043874 ◽

2021 ◽

pp. 154851292110438

Author(s):

George W Clark ◽

Todd R Andel ◽

J Todd McDonald ◽

Tom Johnsten ◽

Tom Thomas

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Defense Mechanisms ◽

Autonomous Vehicle ◽

Learning Algorithms ◽

Learning Model ◽

Machine Learning Algorithms ◽

Robotic Systems ◽

Machine Learning Model ◽

Attack Surface

Robotic systems are no longer simply built and designed to perform sequential repetitive tasks primarily in a static manufacturing environment. Systems such as autonomous vehicles make use of intricate machine learning algorithms to adapt their behavior to dynamic conditions in their operating environment. These machine learning algorithms provide an additional attack surface for an adversary to exploit in order to perform a cyberattack. Since an attack on robotic systems such as autonomous vehicles have the potential to cause great damage and harm to humans, it is essential that detection and defenses of these attacks be explored. This paper discusses the plausibility of direct and indirect cyberattacks on a machine learning model through the use of a virtual autonomous vehicle operating in a simulation environment using a machine learning model for control. Using this vehicle, this paper proposes various methods of detection of cyberattacks on its machine learning model and discusses possible defense mechanisms to prevent such attacks.

Download Full-text

Surface Roughness Prediction using Machine Learning Algorithms while Turning under Different Lubrication Conditions

Journal of Physics Conference Series ◽

10.1088/1742-6596/2070/1/012243 ◽

2021 ◽

Vol 2070 (1) ◽

pp. 012243

Author(s):

A Varun ◽

Mechiri Sandeep Kumar ◽

Karthik Murumulla ◽

Tatiparthi Sathvik

Keyword(s):

Machine Learning ◽

Surface Roughness ◽

Small Businesses ◽

Cooling System ◽

Gaussian Process Regression ◽

Learning Model ◽

Machine Learning Algorithms ◽

Nano Particles ◽

Machining Parameters ◽

Machine Learning Model

Abstract Lathe turning is one of the manufacturing sector’s most basic and important operations. From small businesses to large corporations, optimising machining operations is a key priority. Cooling systems in machining have an important role in determining surface roughness. The machine learning model under discussion assesses the surface roughness of lathe turned surfaces for a variety of materials. To forecast surface roughness, the machine learning model is trained using machining parameters, material characteristics, tool properties, and cooling conditions such as dry, MQL, and hybrid nano particle mixed MQL. Mixing with appropriate nano particles such as copper, aluminium, etc. may significantly improve cooling system heat absorption. To create a data collection for training and testing the model, many standard journals and publications are used. Surface roughness varies with work parameter combinations. In MATLAB, a Gaussian Process Regression (GPR) method will be utilised to construct a model and predict surface roughness. To improve prediction outcomes and make the model more flexible, data from a variety of publications was included. Some characteristics were omitted in order to minimise data noise. Different statistical factors will be explored to predict surface roughness.

Download Full-text

Machine Learning Based Clinical Diagnosis of Liver Patients with Instance Replacement

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1827 ◽

2021 ◽

Author(s):

J. V. D. Prasad ◽

A. Raghuvira Pratap ◽

Babu Sallagundla

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Research Work ◽

Feature Selection Method ◽

Learning Model ◽

Disease Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Huge Data ◽

Machine Learning Model

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.

Download Full-text

Material removal predictions in the robot glass polishing process using machine learning

SN Applied Sciences ◽

10.1007/s42452-021-04916-7 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Max Schneckenburger ◽

Sven Höfler ◽

Luis Garcia ◽

Rui Almeida ◽

Rainer Börret

Keyword(s):

Neural Network ◽

Machine Learning ◽

Material Removal ◽

Optical Glass ◽

Learning Model ◽

Machine Learning Algorithms ◽

List Type ◽

Polishing Process ◽

Machine Learning Model ◽

Glass Polishing

Abstract Robot polishing is increasingly being used in the production of high-end glass workpieces such as astronomy mirrors, lithography lenses, laser gyroscopes or high-precision coordinate measuring machines. The quality of optical components such as lenses or mirrors can be described by shape errors and surface roughness. Whilst the trend towards sub nanometre level surfaces finishes and features progresses, matching both form and finish coherently in complex parts remains a major challenge. With increasing optic sizes, the stability of the polishing process becomes more and more important. If not empirically known, the optical surface must be measured after each polishing step. One approach is to mount sensors on the polishing head in order to measure process-relevant quantities. On the basis of these data, machine learning algorithms can be applied for surface value prediction. Due to the modification of the polishing head by the installation of sensors and the resulting process influences, the first machine learning model could only make removal predictions with insufficient accuracy. The aim of this work is to show a polishing head optimised for the sensors, which is coupled with a machine learning model in order to predict the material removal and failure of the polishing head during robot polishing. The artificial neural network is developed in the Python programming language using the Keras deep learning library. It starts with a simple network architecture and common training parameters. The model will then be optimised step-by-step using different methods and optimised in different steps. The data collected by a design of experiments with the sensor-integrated glass polishing head are used to train the machine learning model and to validate the results. The neural network achieves a prediction accuracy of the material removal of 99.22%. Article highlights First machine learning model application for robot polishing of optical glass ceramics The polishing process is influenced by a large number of different process parameters. Machine learning can be used to adjust any process parameter and predict the change in material removal with a certain probability. For a trained model,empirical experiments are no longer necessary Equipping a polishing head with sensors, which provides the possibility for 100% control

Download Full-text

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods

BMC Bioinformatics ◽

10.1186/s12859-021-04077-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Muhammad Muneeb ◽

Andreas Henschel

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Learning Model ◽

Machine Learning Algorithms ◽

Statistical Techniques ◽

Human Beings ◽

Eye Color ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

Predicting venous thromboembolism in hospitalized trauma patients: a combination of the Caprini score and data-driven machine learning model

BMC Emergency Medicine ◽

10.1186/s12873-021-00447-x ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Lingxiao He ◽

Lei Luo ◽

Xiaoling Hou ◽

Dengbin Liao ◽

Ran Liu ◽

...

Keyword(s):

Machine Learning ◽

Venous Thromboembolism ◽

Prediction Model ◽

Learning Model ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Screening Methods ◽

Trauma Patients ◽

Machine Learning Model ◽

Positive Rate

Abstract Background Venous thromboembolism (VTE) is a common complication of hospitalized trauma patients and has an adverse impact on patient outcomes. However, there is still a lack of appropriate tools for effectively predicting VTE for trauma patients. We try to verify the accuracy of the Caprini score for predicting VTE in trauma patients, and further improve the prediction through machine learning algorithms. Methods We retrospectively reviewed emergency trauma patients who were admitted to a trauma center in a tertiary hospital from September 2019 to March 2020. The data in the patient’s electronic health record (EHR) and the Caprini score were extracted, combined with multiple feature screening methods and the random forest (RF) algorithm to constructs the VTE prediction model, and compares the prediction performance of (1) using only Caprini score; (2) using EHR data to build a machine learning model; (3) using EHR data and Caprini score to build a machine learning model. True Positive Rate (TPR), False Positive Rate (FPR), Area Under Curve (AUC), accuracy, and precision were reported. Results The Caprini score shows a good VTE prediction effect on the trauma hospitalized population when the cut-off point is 11 (TPR = 0.667, FPR = 0.227, AUC = 0.773), The best prediction model is LASSO+RF model combined with Caprini Score and other five features extracted from EHR data (TPR = 0.757, FPR = 0.290, AUC = 0.799). Conclusion The Caprini score has good VTE prediction performance in trauma patients, and the use of machine learning methods can further improve the prediction performance.

Download Full-text

A Multilayer Hybrid Machine Learning Model for Diabetes Detection

ITM Web of Conferences ◽

10.1051/itmconf/20203203032 ◽

2020 ◽

Vol 32 ◽

pp. 03032

Author(s):

Sahil Parab ◽

Piyush Rathod ◽

Durgesh Patil ◽

Vishwanath Chikkareddi

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Model ◽

Point Of View ◽

Machine Learning Algorithms ◽

Classification Model ◽

Case Scenario ◽

Detection Model ◽

Machine Learning Model ◽

Hybrid Machine

Diabetes Detection has been one of the many challenges which is being faced by the medical as well as technological communities. The principles of machine learning and its algorithms is used in order to detect the possibility of a diabetic patient based on their level of glucose concentration , insulin levels and other medically point of view required test reports. The basic diabetes detection model uses Bayesian classification machine learning algorithm, but even though the model is able to detect diabetes, the efficiency is not acceptable at all times because of the drawbacks of the single algorithm of the model. A Hybrid Machine Learning Model is used to overcome the drawbacks produced by a single algorithm model. A Hybrid Model is constructed by implementing multiple applicable machine learning algorithms such as the SVM model and Bayesian’s Classification model or any other models in order to overcome drawbacks faced by each other and also provide their mutually contributed efficiency. In a perfect case scenario the new hybrid machine learning model will be able to provide more efficiency as compared to the old Bayesian’s classification model.

Download Full-text

ECG-based machine-learning algorithms for heartbeat classification

Scientific Reports ◽

10.1038/s41598-021-97118-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Saira Aziz ◽

Sajid Ahmed ◽

Mohamed-Slim Alouini

Keyword(s):

Machine Learning ◽

Fractional Fourier Transform ◽

Heart Diseases ◽

Learning Model ◽

Machine Learning Algorithms ◽

Heartbeat Classification ◽

Time Frequency ◽

Ecg Signals ◽

Signal Features ◽

Machine Learning Model

AbstractElectrocardiogram (ECG) signals represent the electrical activity of the human hearts and consist of several waveforms (P, QRS, and T). The duration and shape of each waveform and the distances between different peaks are used to diagnose heart diseases. In this work, to better analyze ECG signals, a new algorithm that exploits two-event related moving-averages (TERMA) and fractional-Fourier-transform (FrFT) algorithms is proposed. The TERMA algorithm specifies certain areas of interest to locate desired peak, while the FrFT rotates ECG signals in the time-frequency plane to manifest the locations of various peaks. The proposed algorithm’s performance outperforms state-of-the-art algorithms. Moreover, to automatically classify heart disease, estimated peaks, durations between different peaks, and other ECG signal features were used to train a machine-learning model. Most of the available studies uses the MIT-BIH database (only 48 patients). However, in this work, the recently reported Shaoxing People’s Hospital (SPH) database, which consists of more than 10,000 patients, was used to train the proposed machine-learning model, which is more realistic for classification. The cross-database training and testing with promising results is the uniqueness of our proposed machine-learning model.

Download Full-text