scholarly journals A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

Author(s):  
Jia Luo ◽  
Dongwen Yu ◽  
Zong Dai

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eyal Klang ◽  
Benjamin R. Kummer ◽  
Neha S. Dangayach ◽  
Amy Zhong ◽  
M. Arash Kia ◽  
...  

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.


Author(s):  
George W Clark ◽  
Todd R Andel ◽  
J Todd McDonald ◽  
Tom Johnsten ◽  
Tom Thomas

Robotic systems are no longer simply built and designed to perform sequential repetitive tasks primarily in a static manufacturing environment. Systems such as autonomous vehicles make use of intricate machine learning algorithms to adapt their behavior to dynamic conditions in their operating environment. These machine learning algorithms provide an additional attack surface for an adversary to exploit in order to perform a cyberattack. Since an attack on robotic systems such as autonomous vehicles have the potential to cause great damage and harm to humans, it is essential that detection and defenses of these attacks be explored. This paper discusses the plausibility of direct and indirect cyberattacks on a machine learning model through the use of a virtual autonomous vehicle operating in a simulation environment using a machine learning model for control. Using this vehicle, this paper proposes various methods of detection of cyberattacks on its machine learning model and discusses possible defense mechanisms to prevent such attacks.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012243
Author(s):  
A Varun ◽  
Mechiri Sandeep Kumar ◽  
Karthik Murumulla ◽  
Tatiparthi Sathvik

Abstract Lathe turning is one of the manufacturing sector’s most basic and important operations. From small businesses to large corporations, optimising machining operations is a key priority. Cooling systems in machining have an important role in determining surface roughness. The machine learning model under discussion assesses the surface roughness of lathe turned surfaces for a variety of materials. To forecast surface roughness, the machine learning model is trained using machining parameters, material characteristics, tool properties, and cooling conditions such as dry, MQL, and hybrid nano particle mixed MQL. Mixing with appropriate nano particles such as copper, aluminium, etc. may significantly improve cooling system heat absorption. To create a data collection for training and testing the model, many standard journals and publications are used. Surface roughness varies with work parameter combinations. In MATLAB, a Gaussian Process Regression (GPR) method will be utilised to construct a model and predict surface roughness. To improve prediction outcomes and make the model more flexible, data from a variety of publications was included. Some characteristics were omitted in order to minimise data noise. Different statistical factors will be explored to predict surface roughness.


Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Max Schneckenburger ◽  
Sven Höfler ◽  
Luis Garcia ◽  
Rui Almeida ◽  
Rainer Börret

Abstract Robot polishing is increasingly being used in the production of high-end glass workpieces such as astronomy mirrors, lithography lenses, laser gyroscopes or high-precision coordinate measuring machines. The quality of optical components such as lenses or mirrors can be described by shape errors and surface roughness. Whilst the trend towards sub nanometre level surfaces finishes and features progresses, matching both form and finish coherently in complex parts remains a major challenge. With increasing optic sizes, the stability of the polishing process becomes more and more important. If not empirically known, the optical surface must be measured after each polishing step. One approach is to mount sensors on the polishing head in order to measure process-relevant quantities. On the basis of these data, machine learning algorithms can be applied for surface value prediction. Due to the modification of the polishing head by the installation of sensors and the resulting process influences, the first machine learning model could only make removal predictions with insufficient accuracy. The aim of this work is to show a polishing head optimised for the sensors, which is coupled with a machine learning model in order to predict the material removal and failure of the polishing head during robot polishing. The artificial neural network is developed in the Python programming language using the Keras deep learning library. It starts with a simple network architecture and common training parameters. The model will then be optimised step-by-step using different methods and optimised in different steps. The data collected by a design of experiments with the sensor-integrated glass polishing head are used to train the machine learning model and to validate the results. The neural network achieves a prediction accuracy of the material removal of 99.22%. Article highlights First machine learning model application for robot polishing of optical glass ceramics The polishing process is influenced by a large number of different process parameters. Machine learning can be used to adjust any process parameter and predict the change in material removal with a certain probability. For a trained model,empirical experiments are no longer necessary Equipping a polishing head with sensors, which provides the possibility for 100% control


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Muhammad Muneeb ◽  
Andreas Henschel

Abstract Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.


Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lingxiao He ◽  
Lei Luo ◽  
Xiaoling Hou ◽  
Dengbin Liao ◽  
Ran Liu ◽  
...  

Abstract Background Venous thromboembolism (VTE) is a common complication of hospitalized trauma patients and has an adverse impact on patient outcomes. However, there is still a lack of appropriate tools for effectively predicting VTE for trauma patients. We try to verify the accuracy of the Caprini score for predicting VTE in trauma patients, and further improve the prediction through machine learning algorithms. Methods We retrospectively reviewed emergency trauma patients who were admitted to a trauma center in a tertiary hospital from September 2019 to March 2020. The data in the patient’s electronic health record (EHR) and the Caprini score were extracted, combined with multiple feature screening methods and the random forest (RF) algorithm to constructs the VTE prediction model, and compares the prediction performance of (1) using only Caprini score; (2) using EHR data to build a machine learning model; (3) using EHR data and Caprini score to build a machine learning model. True Positive Rate (TPR), False Positive Rate (FPR), Area Under Curve (AUC), accuracy, and precision were reported. Results The Caprini score shows a good VTE prediction effect on the trauma hospitalized population when the cut-off point is 11 (TPR = 0.667, FPR = 0.227, AUC = 0.773), The best prediction model is LASSO+RF model combined with Caprini Score and other five features extracted from EHR data (TPR = 0.757, FPR = 0.290, AUC = 0.799). Conclusion The Caprini score has good VTE prediction performance in trauma patients, and the use of machine learning methods can further improve the prediction performance.


2020 ◽  
Vol 32 ◽  
pp. 03032
Author(s):  
Sahil Parab ◽  
Piyush Rathod ◽  
Durgesh Patil ◽  
Vishwanath Chikkareddi

Diabetes Detection has been one of the many challenges which is being faced by the medical as well as technological communities. The principles of machine learning and its algorithms is used in order to detect the possibility of a diabetic patient based on their level of glucose concentration , insulin levels and other medically point of view required test reports. The basic diabetes detection model uses Bayesian classification machine learning algorithm, but even though the model is able to detect diabetes, the efficiency is not acceptable at all times because of the drawbacks of the single algorithm of the model. A Hybrid Machine Learning Model is used to overcome the drawbacks produced by a single algorithm model. A Hybrid Model is constructed by implementing multiple applicable machine learning algorithms such as the SVM model and Bayesian’s Classification model or any other models in order to overcome drawbacks faced by each other and also provide their mutually contributed efficiency. In a perfect case scenario the new hybrid machine learning model will be able to provide more efficiency as compared to the old Bayesian’s classification model.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Saira Aziz ◽  
Sajid Ahmed ◽  
Mohamed-Slim Alouini

AbstractElectrocardiogram (ECG) signals represent the electrical activity of the human hearts and consist of several waveforms (P, QRS, and T). The duration and shape of each waveform and the distances between different peaks are used to diagnose heart diseases. In this work, to better analyze ECG signals, a new algorithm that exploits two-event related moving-averages (TERMA) and fractional-Fourier-transform (FrFT) algorithms is proposed. The TERMA algorithm specifies certain areas of interest to locate desired peak, while the FrFT rotates ECG signals in the time-frequency plane to manifest the locations of various peaks. The proposed algorithm’s performance outperforms state-of-the-art algorithms. Moreover, to automatically classify heart disease, estimated peaks, durations between different peaks, and other ECG signal features were used to train a machine-learning model. Most of the available studies uses the MIT-BIH database (only 48 patients). However, in this work, the recently reported Shaoxing People’s Hospital (SPH) database, which consists of more than 10,000 patients, was used to train the proposed machine-learning model, which is more realistic for classification. The cross-database training and testing with promising results is the uniqueness of our proposed machine-learning model.


Sign in / Sign up

Export Citation Format

Share Document