scholarly journals A data-oriented approach for outlier detection

Author(s):  
Nripesh Trivedi

In this paper, characteristics of data obtained from the sensors (used in OpenSense project) are identified in order to build a data-oriented approach. This approach consists of application of Class Outliers: Distance Based (CODB) and Hoeffding tree algorithms. Subsequently, machine learning models were built to detect outliers in a sensor data stream. The approach presented in this paper may be used for developing methodologies for data-oriented outlier detection

Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3491 ◽  
Author(s):  
Issam Hammad ◽  
Kamal El-Sankary

Accuracy evaluation in machine learning is based on the split of data into a training set and a test set. This critical step is applied to develop machine learning models including models based on sensor data. For sensor-based problems, comparing the accuracy of machine learning models using the train/test split provides only a baseline comparison in ideal situations. Such comparisons won’t consider practical production problems that can impact the inference accuracy such as the sensors’ thermal noise, performance with lower inference quantization, and tolerance to sensor failure. Therefore, this paper proposes a set of practical tests that can be applied when comparing the accuracy of machine learning models for sensor-based problems. First, the impact of the sensors’ thermal noise on the models’ inference accuracy was simulated. Machine learning algorithms have different levels of error resilience to thermal noise, as will be presented. Second, the models’ accuracy using lower inference quantization was compared. Lowering inference quantization leads to lowering the analog-to-digital converter (ADC) resolution which is cost-effective in embedded designs. Moreover, in custom designs, analog-to-digital converters’ (ADCs) effective number of bits (ENOB) is usually lower than the ideal number of bits due to various design factors. Therefore, it is practical to compare models’ accuracy using lower inference quantization. Third, the models’ accuracy tolerance to sensor failure was evaluated and compared. For this study, University of California Irvine (UCI) ‘Daily and Sports Activities’ dataset was used to present these practical tests and their impact on model selection.


2020 ◽  
Author(s):  
Zengwei Zheng ◽  
Lifei Shi ◽  
Sha Zhao ◽  
Jianmin Hou ◽  
Lin Sun ◽  
...  

Abstract Earthquake Early Warning (EEW) system detects earthquakes and sends an early warning to areas likely to be affected, which plays a significant role in reducing earthquake damage. In recent years, as with the widespread distribution of smartphones, as well as their powerful computing ability and advanced built-in sensors, a new interdisciplinary research method of smartphone-based earthquake early warning has emerged. Smartphones-based earthquake early warning system applies signal processing techniques and machine learning algorithms to the sensor data recorded by smartphones for better monitoring earthquakes. But it is challenging to collect abundant phone-recorded seismic data for training related machine learning models and selecting appropriate features for these models. One alternative way to solve this problem is to transform the data recorded by seismic networks into phone-quality data. In this paper, we propose such a transformation method by learning the differences between the data recorded by seismic networks and smartphones, in two scenarios: phone fixed and free located on tables, respectively. By doing this, we can easily generate abundant phone-quality earthquake data to train machine learning models used in EEW systems. We evaluate our transformation method by conducting various experiments, and our method performs much better than existing methods. Furthermore, we set up a case study where we use the transformed records to train machine learning models for earthquake intensity prediction. The results show that the model trained by using our transformed data produces superior performance, suggesting that our transformation method is useful for smartphone-based earthquake early warning.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 740
Author(s):  
Danica Hendry ◽  
Ryan Leadbetter ◽  
Kristoffer McKee ◽  
Luke Hopper ◽  
Catherine Wild ◽  
...  

This study aimed to develop a wearable sensor system, using machine-learning models, capable of accurately estimating peak ground reaction force (GRF) during ballet jumps in the field. Female dancers (n = 30) performed a series of bilateral and unilateral ballet jumps. Dancers wore six ActiGraph Link wearable sensors (100 Hz). Data were collected simultaneously from two AMTI force platforms and synchronised with the ActiGraph data. Due to sensor hardware malfunctions and synchronisation issues, a multistage approach to model development, using a reduced data set, was taken. Using data from the 14 dancers with complete multi-sensor synchronised data, the best single sensor was determined. Subsequently, the best single sensor model was refined and validated using all available data for that sensor (23 dancers). Root mean square error (RMSE) in body weight (BW) and correlation coefficients (r) were used to assess the GRF profile, and Bland–Altman plots were used to assess model peak GRF accuracy. The model based on sacrum data was the most accurate single sensor model (unilateral landings: RMSE = 0.24 BW, r = 0.95; bilateral landings: RMSE = 0.21 BW, r = 0.98) with the refined model still showing good accuracy (unilateral: RMSE = 0.42 BW, r = 0.80; bilateral: RMSE = 0.39 BW, r = 0.92). Machine-learning models applied to wearable sensor data can provide a field-based system for GRF estimation during ballet jumps.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Fabien Plisson ◽  
Obed Ramírez-Sánchez ◽  
Cristina Martínez-Hernández

Abstract Reducing hurdles to clinical trials without compromising the therapeutic promises of peptide candidates becomes an essential step in peptide-based drug design. Machine-learning models are cost-effective and time-saving strategies used to predict biological activities from primary sequences. Their limitations lie in the diversity of peptide sequences and biological information within these models. Additional outlier detection methods are needed to set the boundaries for reliable predictions; the applicability domain. Antimicrobial peptides (AMPs) constitute an extensive library of peptides offering promising avenues against antibiotic-resistant infections. Most AMPs present in clinical trials are administrated topically due to their hemolytic toxicity. Here we developed machine learning models and outlier detection methods that ensure robust predictions for the discovery of AMPs and the design of novel peptides with reduced hemolytic activity. Our best models, gradient boosting classifiers, predicted the hemolytic nature from any peptide sequence with 95–97% accuracy. Nearly 70% of AMPs were predicted as hemolytic peptides. Applying multivariate outlier detection models, we found that 273 AMPs (~ 9%) could not be predicted reliably. Our combined approach led to the discovery of 34 high-confidence non-hemolytic natural AMPs, the de novo design of 507 non-hemolytic peptides, and the guidelines for non-hemolytic peptide design.


2021 ◽  
Vol 18 (4) ◽  
pp. 1-26
Author(s):  
Wonik Seo ◽  
Sanghoon Cha ◽  
Yeonjae Kim ◽  
Jaehyuk Huh ◽  
Jongse Park

With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Sign in / Sign up

Export Citation Format

Share Document