Leveraging of Weighted Ensemble Technique for Identifying Medical Concepts from Clinical Texts at Word and Phrase Level

2021 ◽  
Author(s):  
Dipankar Das ◽  
Krishna Sharma

Concept identification from medical texts becomes important due to digitization. However, it is not always feasible to identify all such medical concepts manually. Thus, in the present attempt, we have applied five machine learning classifiers (Support Vector Machine, K-Nearest Neighbours, Logistic Regression, Random Forest and Naïve Bayes) and one deep learning classifier (Long Short Term Memory) to identify medical concepts by training a total of 27.383K sentences. In addition, we have also developed a rule based phrase identification module to help the existing classifiers for identifying multi- word medical concepts. We have employed word2vec technique for feature extraction and PCA and T- SNE for conducting ablation study over various features to select important ones. Finally, we have adopted two different ensemble approaches, stacking and weighted sum to improve the performance of the individual classifier and significant improvements were observed with respect to each of the classifiers. It has been observed that phrase identification module plays an important role when dealing with individual classifier in identifying higher order ngram medical concepts. Finally, the ensemble approach enhances the results over SVM that was showing initial improvement even after the application of phrase based module.

Author(s):  
Ilhan Aydin ◽  
Selahattin B Celebi ◽  
Sami Barmada ◽  
Mauro Tucci

The pantograph-catenary subsystem is a fundamental component of a railway train since it provides the traction electrical power. A bad operating condition or, even worse, a failure can disrupt the railway traffic creating economic damages and, in some cases, serious accidents. Therefore, the correct operation of such subsystems should be ensured in order to have an economically efficient, reliable and safe transportation system. In this study, a new arc detection method was proposed and is based on features from the current and voltage signals collected by the pantograph. A tool named mathematical morphology is applied to voltage and current signals to emphasize the effect of the arc, before applying the fast Fourier transform to obtain the power spectrum. Afterwards, three support vector machine-based classifiers are trained separately to detect the arcs, and a fuzzy integral technique is used to synthesize the results obtained by the individual classifiers, therefore implementing a classifier fusion technique. The experimental results show that the proposed approach is effective for the detection of arcs, and the fusion of classifier has a higher detection accuracy than any individual classifier.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7653 ◽  
Author(s):  
Mahyat Shafapour Tehrany ◽  
Lalit Kumar ◽  
Farzin Shabani

In this study, we propose and test a novel ensemble method for improving the accuracy of each method in flood susceptibility mapping using evidential belief function (EBF) and support vector machine (SVM). The outcome of the proposed method was compared with the results of each method. The proposed method was implemented four times using different SVM kernels. Hence, the efficiency of each SVM kernel was also assessed. First, a bivariate statistical analysis using EBF was performed to assess the correlations among the classes of each flood conditioning factor with flooding. Subsequently, the outcome of the first stage was used in a multivariate statistical analysis performed by SVM. A highest prediction accuracy of 92.11% was achieved by an ensemble EBF-SVM—radial basis function method; the achieved accuracy was 7% and 3% higher than that offered by the individual EBF method and the individual SVM method, respectively. Among all the applied methods, both the individual EBF and SVM methods achieved the lowest accuracies. The reason for the improved accuracy offered by the ensemble methods is that by integrating the methods, a more detailed assessment of the flooding and conditioning factors can be performed, thereby increasing the accuracy of the final map.


Author(s):  
Dylan J. Gunn ◽  
Zhipeng Liu ◽  
Rushit Dave ◽  
Xiaohong Yuan ◽  
Kaushik Roy

In this modern world, mobile devices have been paired with the cloud environment to scale the voluminous amount of generated data. The implementation comes at the cost of privacy as proprietary data can be stolen in transit to the cloud, or victims’ phones can be seized along with synced data from cloud. The attacker can gain access to the phone through shoulder surfing, or even spoofing attacks. Our approach is to mitigate this issue by proposing an active cloud authentication framework using touch biometric pattern. To the best of our knowledge, active cloud authentication using touch dynamics for mobile cloud computing has not been explored in the literature. This research creates a proof of concept that will lead into a simulated cloud framework for active authentication. Given the amount of data captured by the mobile device from user activity, it can be a computationally intensive process for the mobile device to handle with such limited resources. To solve this, we simulated a post-transmission process of data to the cloud so that we could implement the authentication process within the cloud. We evaluated the touch data using traditional machine learning algorithms, such as Random Forest (RF), Support Vector Machine (SVM), and also using a deep learning classifier, the Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) algorithms. The novelty of this work is two-fold. First, we develop a distributed tensorflow framework for cloud authentication using touch biometric pattern. This framework helps alleviate the drawback of the computationally intensive recognition of the substantial amount of raw data from the user. Second, we apply the RF, SVM, and a deep learning classifier, the LSTM-RNN, on the touch data to evaluate the performance of the proposed authentication scheme. The proposed approach shows a promising performance with an accuracy of 99.0361% using RF on the distributed tensorflow framework.


2021 ◽  
Vol 11 (24) ◽  
pp. 11783
Author(s):  
Varinya Phanichraksaphong ◽  
Wei-Ho Tsai

Music plays an important part in the lives of people from an early age. Many parents invest in music education of various types for their children as arts and music are of economic importance. This leads to a new trend that the STEAM education system draws more and more attention from the STEM education system that has been developed over several years. For example, parents let their children listen to music since they were in the womb and invest their money in studying music at an early age, especially for playing and learning musical instruments. As far as education is concerned, assessment for music performances should be standardized, not based on the individual teacher’s standard. Thus, in this study, automatic assessment methods for piano performances were developed. Two types of piano articulation were taken into account, namely “Legato” with vibration notes using sustain pedals and “Staccato” with detached notes without the use of sustain pedals. For each type, piano sounds were analyzed and classified into “Good”, “Normal”, and “Bad” categories. The study investigated four approaches for this task: Support Vector Machine (SVM), Naive Bayes (NB), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). The experiments were conducted using 4680 test samples, including isolated scale notes and kids’ songs, produced by 13 performers. The results show that the CNN approach is superior to the other approaches, with a classification accuracy of more than eighty percent.


Author(s):  
Vahid Nourani ◽  
Ali Kheiri ◽  
Nazanin Behfar

Abstract In this study, Artificial Intelligence (AI) models along with ensemble techniques were employed for predicting the SSL via single-station and multi-station scenarios. Feed Forward Neural Networks (FFNNs), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Support Vector Regression (SVR) were the employed AI models, and the simple averaging (SA), weighted averaging (WA), and neural averaging (NA) were the ensemble techniques developed for combining the outputs of the individual AI models to gain more accurate estimations of the SSL. For this purpose, twenty-year observed streamflow and SSL data of three gauging stations, located in Missouri and Upper Mississippi regions were utilized in both daily and monthly scales. The obtained results of both scenarios indicated the supremacy of ensemble techniques to single AI models. The neural ensemble demonstrated more reliable performance comparing to other ensemble techniques. For instance, in the first scenario, the ensemble technique increased the predicted results up to 20% in the verification phase of the daily and monthly modeling and up to 5 and 8% in the verification step of the second scenario.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


2020 ◽  
Vol 10 (3) ◽  
pp. 62
Author(s):  
Tittaya Mairittha ◽  
Nattaya Mairittha ◽  
Sozo Inoue

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Forecasting ◽  
2021 ◽  
Vol 3 (2) ◽  
pp. 377-420
Author(s):  
Julien Chevallier ◽  
Dominique Guégan ◽  
Stéphane Goutte

This paper focuses on forecasting the price of Bitcoin, motivated by its market growth and the recent interest of market participants and academics. We deploy six machine learning algorithms (e.g., Artificial Neural Network, Support Vector Machine, Random Forest, k-Nearest Neighbours, AdaBoost, Ridge regression), without deciding a priori which one is the ‘best’ model. The main contribution is to use these data analytics techniques with great caution in the parameterization, instead of classical parametric modelings (AR), to disentangle the non-stationary behavior of the data. As soon as Bitcoin is also used for diversification in portfolios, we need to investigate its interactions with stocks, bonds, foreign exchange, and commodities. We identify that other cryptocurrencies convey enough information to explain the daily variation of Bitcoin’s spot and futures prices. Forecasting results point to the segmentation of Bitcoin concerning alternative assets. Finally, trading strategies are implemented.


Sign in / Sign up

Export Citation Format

Share Document