Machine Learning for the intelligent analysis of 3D printing conditions using environmental sensor data to support quality assurance

Abstract A shortcoming of using environmental sensors for the surveillance of potentially concussive events is substantial uncertainty regarding whether the event was caused by head acceleration (“head impacts”) or sensor motion (with no head acceleration). The goal of the present study is to develop a machine learning model to classify environmental sensor data obtained in the field and evaluate the performance of the model against the performance of the proprietary classification algorithm used by the environmental sensor. Data were collected from Soldiers attending sparring sessions conducted under a U.S. Army Combatives School course. Data from one sparring session were used to train a decision tree classification algorithm to identify good and bad signals. Data from the remaining sparring sessions were kept as an external validation set. The performance of the proprietary algorithm used by the sensor was also compared to the trained algorithm performance. The trained decision tree was able to correctly classify 95% of events for internal cross-validation and 88% of events for the external validation set. Comparatively, the proprietary algorithm was only able to correctly classify 61% of the events. In general, the trained algorithm was better able to predict when a signal was good or bad compared to the proprietary algorithm. The present study shows it is possible to train a decision tree algorithm using environmental sensor data collected in the field.

Download Full-text

On the potential and challenges of using machine-learning for automated quality control of environmental sensor data

10.5194/egusphere-egu2020-20777 ◽

2020 ◽

Author(s):

Lennart Schmidt ◽

Hannes Mollenhauer ◽

Corinna Rebmann ◽

David Schäfer ◽

Antje Claussnitzer ◽

...

Keyword(s):

Machine Learning ◽

Quality Control ◽

Ground Truth ◽

Sensor Data ◽

Small Scale ◽

Ground Truth Data ◽

Starting Point ◽

Environmental Sensor ◽

Spatio Temporal ◽

Automated Quality Control

With more and more data being gathered from environmental sensor networks, the importance of automated quality-control (QC) routines to provide usable data in near-real time is becoming increasingly apparent. Machine-learning (ML) algorithms exhibit a high potential to this respect as they are able to exploit the spatio-temporal relation of multiple sensors to identify anomalies while allowing for non-linear functional relations in the data. In this study, we evaluate the potential of ML for automated QC on two spatio-temporal datasets at different spatial scales: One is a dataset of atmospheric variables at 53 stations across Northern Germany. The second dataset contains timeseries of soil moisture and temperature at 40 sensors at a small-scale measurement plot.Furthermore, we investigate strategies to tackle three challenges that are commonly present when applying ML for QC: 1) As sensors might drop out, the ML models have to be designed to be robust against missing values in the input data. We address this by comparing different data imputation methods, coupled with a binary representation of whether a value is missing or not. 2) Quality flags that mark erroneous data points to serve as ground truth for model training might not be available. And 3) There is no guarantee that the system under study is stationary, which might render the outputs of a trained model useless in the future. To address 2) and 3), we frame the problem both as a supervised and unsupervised learning problem. Here, the use of unsupervised ML-models can be beneficial as they do not require ground truth data and can thus be retrained more easily should the system be subject to significant changes. In this presentation, we discuss the performance, advantages and drawbacks of the proposed strategies to tackle the aforementioned challenges. Thus, we provide a starting point for researchers in the largely untouched field of ML application for automated quality control of environmental sensor data.

Download Full-text

A Machine Learning Approach to Indoor Occupancy Detection Using Non-Intrusive Environmental Sensor Data

Proceedings of the 3rd International Conference on Big Data and Internet of Things - BDIOT 2019 ◽

10.1145/3361758.3361775 ◽

2019 ◽

Author(s):

Sofiane Zemouri ◽

Yiannis Gkoufas ◽

John Murphy

Keyword(s):

Machine Learning ◽

Sensor Data ◽

Learning Approach ◽

Occupancy Detection ◽

Machine Learning Approach ◽

Environmental Sensor

Download Full-text

Environmental Sensor Networks and Continuous Data Quality Assurance to Manage Salinity within a Highly Regulated River Basin.

Decision Support Systems in Agriculture, Food and the Environment - Advances in Environmental Engineering and Green Technologies ◽

10.4018/978-1-61520-881-4.ch019 ◽

2011 ◽

pp. 420-436 ◽

Cited By ~ 2

Author(s):

Nigel W.T. Quinn ◽

Ricardo Ortega ◽

Lisa M. Holm

Keyword(s):

Quality Assurance ◽

Decision Support ◽

Data Quality ◽

Sensor Data ◽

Regulated River ◽

Web Based ◽

Murray Darling Basin ◽

Environmental Sensor ◽

San Joaquin Basin ◽

Environmental Decision

Environmental sensor networks enjoy widespread deployment as monitoring systems have become easier to design and implement in the field and installation costs have fallen. Unfortunately software systems for data quality assurance have not kept pace with the development of these sensor network technologies and risk compromising the potential of these innovative systems by making it difficult to assess the accuracy and consistency of the data. Lingering uncertainty can constrain the willingness of stakeholders to make operational decisions on the basis of the real-time sensor data: a few negative experiences can do irreparable damage to a project which is attempting to change stakeholder behavior. Management of river salt loads in complex and highly regulated river basins such as the Murray Darling Basin in south-east Australia and the San Joaquin Basin in California, USA present significant challenges to Information Technology infrastructure within resource agencies that often have a poor history of coordination and data sharing. In the San Joaquin Basin, web-based environmental data dissemination initiatives to address salinity issues need to overcome a fear of loss of autonomy as well as data quality assurance and data reliability issues. These environmental decision support issues are contrasted with those facing resource managers in the Murray Darling Basin. This paper describes a new approach to environmental decision support for salinity management in the San Joaquin Basin of California that focuses on web-based data sharing using YSI Econet technology and continuous data quality management using a novel software tool, Aquarius. Commercial turn-key monitoring systems such as YSI EcoNet provide real-time web-access to sensor data as well as providing the owner full control over the way the data is visualized. The same web-sites use GIS to superimpose the monitoring site locations on maps and local hydrography and allow point and click access to the data collected at each environmental monitoring site. This Information Technology suite of software and hardware work together to provide timely, reliable and high quality data in a manner that can used by stakeholder decision makers to better manage salt export to the San Joaquin River and ensure compliance with State water quality objectives. The technologies developed for this application can be extended to improve compliance with TMDL water quality objectives over entire river basins and should have applicability in any watershed where environmental decision support systems are being developed to assist for stakeholders as part of a coordinated strategy for non-point pollutant load reduction.

Download Full-text

Supervised and unsupervised machine-learning for automated quality control of environmental sensor data

10.5194/egusphere-egu21-14485 ◽

2021 ◽

Author(s):

Julius Polz ◽

Lennart Schmidt ◽

Luca Glawion ◽

Maximilian Graf ◽

Christian Werner ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Quality Control ◽

Real Time ◽

Experimental System ◽

Training Data ◽

Sensor Data ◽

Small Scale ◽

Environmental Sensor ◽

Erroneous Data

We can observe a global decrease of well maintained weather stations by meteorological services and governmental institutes. At the same time, environmental sensor data is increasing through the use of opportunistic or remote sensing approaches. Overall, the trend for environmental sensor networks is strongly going towards automated routines, especially for quality-control (QC) to provide usable data in near real-time. A common QC scenario is that data is being flagged manually using expert knowledge and visual inspection by humans. To reduce this tedious process and to enable near-real time data provision, machine-learning (ML) algorithms exhibit a high potential as they can be designed to imitate the experts actions.&#160;Here we address these three common challenges when applying ML for QC: 1) Robustness to missing values in the input data. 2) Availability of training data, i.e. manual quality flags that mark erroneous data points. And 3) Generalization of the model regarding non-stationary behavior of one&#160; experimental system or changes in the experimental setup when applied to a different study area. We approach the QC problem and the related issues both as a supervised and an unsupervised learning problem using deep neural networks on the one hand and dimensionality reduction combined with clustering algorithms on the other.We compare the different ML algorithms on two time-series datasets to test their applicability across scales and domains. One dataset consists of signal levels of 4000 commercial microwave links distributed all over Germany that can be used to monitor precipitation. The second dataset contains time-series of soil moisture and temperature from 120 sensors deployed at a small-scale measurement plot at the TERENO site &#8220;Hohes Holz&#8221;.First results show that supervised ML provides an optimized performance for QC for an experimental system not subject to change and at the cost of a laborious preparation of the training data. The unsupervised approach is also able to separate valid from erroneous data at reasonable accuracy. However, it provides the additional benefit that it does not require manual flags and can thus be retrained more easily in case the system is subject to significant changes.&#160;In this presentation, we discuss the performance, advantages and drawbacks of the proposed ML routines to tackle the aforementioned challenges. Thus, we aim to provide a starting point for researchers in the promising field of ML application for automated QC of environmental sensor data.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div> </div>

Download Full-text

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Machine Learning and Knowledge Extraction ◽

10.3390/make3020020 ◽

2021 ◽

Vol 3 (2) ◽

pp. 392-413

Author(s):

Stefan Studer ◽

Thanh Binh Bui ◽

Christian Drescher ◽

Alexander Hanuschkin ◽

Ludwig Winkler ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Process Model ◽

Practical Experience ◽

Special Focus ◽

Close Monitoring ◽

Machine Learning Applications ◽

Project Organizations ◽

Considerable Impact ◽

Learning Development

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

Download Full-text

Automatic Identification of Upper Extremity Rehabilitation Exercise Type and Dose Using Body-Worn Sensors and Machine Learning: A Pilot Study

Digital Biomarkers ◽

10.1159/000516619 ◽

2021 ◽

pp. 158-166

Author(s):

Noah Balestra ◽

Gaurav Sharma ◽

Linda M. Riek ◽

Ania Busza

Keyword(s):

Machine Learning ◽

Upper Extremity ◽

Sensor Data ◽

Inpatient Setting ◽

Accelerometer Data ◽

Data Set ◽

Machine Learning Classification ◽

Exercise Type ◽

Exercise Dose ◽

Rehabilitation Exercises

Background: Prior studies suggest that participation in rehabilitation exercises improves motor function poststroke; however, studies on optimal exercise dose and timing have been limited by the technical challenge of quantifying exercise activities over multiple days. Objectives: The objectives of this study were to assess the feasibility of using body-worn sensors to track rehabilitation exercises in the inpatient setting and investigate which recording parameters and data analysis strategies are sufficient for accurately identifying and counting exercise repetitions. Methods: MC10 BioStampRC® sensors were used to measure accelerometer and gyroscope data from upper extremities of healthy controls (n = 13) and individuals with upper extremity weakness due to recent stroke (n = 13) while the subjects performed 3 preselected arm exercises. Sensor data were then labeled by exercise type and this labeled data set was used to train a machine learning classification algorithm for identifying exercise type. The machine learning algorithm and a peak-finding algorithm were used to count exercise repetitions in non-labeled data sets. Results: We achieved a repetition counting accuracy of 95.6% overall, and 95.0% in patients with upper extremity weakness due to stroke when using both accelerometer and gyroscope data. Accuracy was decreased when using fewer sensors or using accelerometer data alone. Conclusions: Our exploratory study suggests that body-worn sensor systems are technically feasible, well tolerated in subjects with recent stroke, and may ultimately be useful for developing a system to measure total exercise “dose” in poststroke patients during clinical rehabilitation or clinical trials.

Download Full-text

Machine Learning Assisted PUF Calibration for Trustworthy Proof of Sensor Data in IoT

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3393628 ◽

2020 ◽

Vol 25 (4) ◽

pp. 1-21

Author(s):

Urbi Chatterjee ◽

Soumi Chatterjee ◽

Debdeep Mukhopadhyay ◽

Rajat Subhra Chakraborty

Keyword(s):

Machine Learning ◽

Sensor Data

Download Full-text

Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems

ACM Transactions on Cyber-Physical Systems ◽

10.1145/3445812 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-30

Author(s):

Gonçalo Jesus ◽

António Casimiro ◽

Anabela Oliveira

Keyword(s):

Machine Learning ◽

Environmental Monitoring ◽

Data Quality ◽

Outlier Detection ◽

Prediction Models ◽

Sensor Data ◽

Natural Phenomenon ◽

Monitoring Systems ◽

Data Errors ◽

Redundant Data

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.

Download Full-text