Unsupervised Learning Models for Unlabeled Genomic, Transcriptomic & Proteomic Data

2022 ◽  
2020 ◽  
pp. 016555152091003
Author(s):  
Gyeong Taek Lee ◽  
Chang Ouk Kim ◽  
Min Song

Sentiment analysis plays an important role in understanding individual opinions expressed in websites such as social media and product review sites. The common approaches to sentiment analysis use the sentiments carried by words that express opinions and are based on either supervised or unsupervised learning techniques. The unsupervised learning approach builds a word-sentiment dictionary, but it requires lengthy time periods and high costs to build a reliable dictionary. The supervised learning approach uses machine learning models to learn the sentiment scores of words; however, training a classifier model requires large amounts of labelled text data to achieve a good performance. In this article, we propose a semisupervised approach that performs well despite having only small amounts of labelled data available for training. The proposed method builds a base sentiment dictionary from a small training dataset using a lasso-based ensemble model with minimal human effort. The scores of words not in the training dataset are estimated using an adaptive instance-based learning model. In a pretrained word2vec model space, the sentiment values of the words in the dictionary are propagated to the words that did not exist in the training dataset. Through two experiments, we demonstrate that the performance of the proposed method is comparable to that of supervised learning models trained on large datasets.


2021 ◽  
Vol 1 (1) ◽  
pp. 61-74
Author(s):  
Sohrab Mokhtari ◽  
◽  
Kang K Yen

<abstract><p>Anomaly detection strategies in industrial control systems mainly investigate the transmitting network traffic called network intrusion detection system. However, The measurement intrusion detection system inspects the sensors data integrated into the supervisory control and data acquisition center to find any abnormal behavior. An approach to detect anomalies in the measurement data is training supervised learning models that can learn to classify normal and abnormal data. But, a labeled dataset consisting of abnormal behavior, such as attacks, or malfunctions is extremely hard to achieve. Therefore, the unsupervised learning strategy that does not require labeled data for being trained can be helpful to tackle this problem. This study evaluates the performance of unsupervised learning strategies in anomaly detection using measurement data in control systems. The most accurate algorithms are selected to train unsupervised learning models, and the results show an accuracy of 98% in stealthy attack detection.</p></abstract>


2021 ◽  
Vol 5 (6) ◽  
pp. 840-854
Author(s):  
Jesmeen M. Z. H. ◽  
J. Hossen ◽  
Azlan Bin Abd. Aziz

Recent years have seen significant growth in the adoption of smart home devices. It involves a Smart Home System for better visualisation and analysis with time series. However, there are a few challenges faced by the system developers, such as data quality or data anomaly issues. These anomalies can be due to technical or non-technical faults. It is essential to detect the non-technical fault as it might incur economic cost. In this study, the main objective is to overcome the challenge of training learning models in the case of an unlabelled dataset. Another important consideration is to train the model to be able to discriminate abnormal consumption from seasonal-based consumption. This paper proposes a system using unsupervised learning for Time-Series data in the smart home environment. Initially, the model collected data from the real-time scenario. Following seasonal-based features are generated from the time-domain, followed by feature reduction technique PCA to 2-dimension data. This data then passed through four known unsupervised learning models and was evaluated using the Excess Mass and Mass-Volume method. The results concluded that LOF tends to outperform in the case of detecting anomalies in electricity consumption. The proposed model was further evaluated by benchmark anomaly dataset, and it was also proved that the system could work with the different fields containing time-series data. The model will cluster data into anomalies and not. The developed anomaly detector will detect all anomalies as soon as possible, triggering real alarms in real-time for time-series data's energy consumption. It has the capability to adapt to changing values automatically. Doi: 10.28991/esj-2021-01314 Full Text: PDF


Sign in / Sign up

Export Citation Format

Share Document