Evaluation and calibration of a low-cost particle sensor in ambient conditions using machine-learning methods

Abstract. Particle sensing technology has shown great potential for monitoring particulate matter (PM) with very few temporal and spatial restrictions because of its low cost, compact size, and easy operation. However, the performance of low-cost sensors for PM monitoring in ambient conditions has not been thoroughly evaluated. Monitoring results by low-cost sensors are often questionable. In this study, a low-cost fine particle monitor (Plantower PMS 5003) was colocated with a reference instrument, the Synchronized Hybrid Ambient Real-time Particulate (SHARP) monitor, at the Calgary Varsity air monitoring station from December 2018 to April 2019. The study evaluated the performance of this low-cost PM sensor in ambient conditions and calibrated its readings using simple linear regression (SLR), multiple linear regression (MLR), and two more powerful machine-learning algorithms using random search techniques for the best model architectures. The two machine-learning algorithms are XGBoost and a feedforward neural network (NN). Field evaluation showed that the Pearson correlation (r) between the low-cost sensor and the SHARP instrument was 0.78. The Fligner and Killeen (F–K) test indicated a statistically significant difference between the variances of the PM2.5 values by the low-cost sensor and the SHARP instrument. Large overestimations by the low-cost sensor before calibration were observed in the field and were believed to be caused by the variation of ambient relative humidity. The root mean square error (RMSE) was 9.93 when comparing the low-cost sensor with the SHARP instrument. The calibration by the feedforward NN had the smallest RMSE of 3.91 in the test dataset compared to the calibrations by SLR (4.91), MLR (4.65), and XGBoost (4.19). After calibrations, the F–K test using the test dataset showed that the variances of the PM2.5 values by the NN, XGBoost, and the reference method were not statistically significantly different. From this study, we conclude that a feedforward NN is a promising method to address the poor performance of low-cost sensors for PM2.5 monitoring. In addition, the random search method for hyperparameters was demonstrated to be an efficient approach for selecting the best model structure.

Download Full-text

Evaluation and Calibration of a Low-cost Particle Sensor in Ambient Conditions Using Machine Learning Technologies

10.5194/amt-2019-393 ◽

2019 ◽

Author(s):

Minxing Si ◽

Ying Xiong ◽

Shan Du ◽

Ke Du

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Random Search ◽

Low Cost ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ambient Conditions ◽

Test Dataset ◽

Compact Size ◽

Significant Difference

Abstract. Particle sensing technology has shown great potential for monitoring particulate matter (PM) with very few temporal and spatial restrictions because of low-cost, compact size, and easy operation. However, the performance of low-cost sensors for PM monitoring in ambient conditions has not been thoroughly evaluated. Monitoring results by low-cost sensors are often questionable. In this study, a low-cost fine particle monitor (Plantower PMS 5003) was co-located with a reference instrument, named Synchronized Hybrid Ambient Real-time Particulate (SHARP) monitor, in Calgary Varsity air monitoring station from December 2018 to April 2019. The study evaluated the performance of this low-cost PM sensor in ambient conditions and calibrated its readings using simple linear regression (SLR), multiple linear regression (MLR), and two more powerful machine learning algorithms using random search techniques for the best model architectures. The two machine learning algorithms are XGBoost and feedforward neural network (NN). Field evaluation showed that the Pearson r between the low-cost sensor and the SHAPR instrument was 0.78. Fligner and Killeen (F-K) test indicated a statistically significant difference between the variances of the PM2.5 values by the low-cost sensor and by the SHARP instrument. Large overestimations by the low-cost sensor before calibration were observed in the field and were believed to be caused by the variation of ambient relative humidity. The root mean square error (RMSE) was 9.93 when comparing the low-cost sensor with the SHARP instrument. The calibration by the feedforward NN had the smallest RMSE of 3.91 in the test dataset, compared to the calibrations by SLR (4.91), MLR (4.65), and XGBoost (4.19). After calibrations, the F-K test using the test dataset showed that the variances of the PM2.5 values by the NN and the XGBoost and by the reference method were not statistically significantly different. From this study, we conclude that feedforward NN is a promising method to address the poor performance of the low-cost sensors for PM2.5 monitoring. In addition, the random search method for hyperparameters was demonstrated to be an efficient approach for selecting the best model structure.

Download Full-text

Machine Learning Models for Finger Bend Evaluation using Implemented Low cost Flex Sensor

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35742 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3605-3611

Author(s):

Pratyush Kaware

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Low Cost ◽

Learning Algorithms ◽

Cost Effective ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.

Download Full-text

A Review on Linear Regression Comprehensive in Machine Learning

Journal of Applied Science and Technology Trends ◽

10.38094/jastt1457 ◽

2020 ◽

Vol 1 (4) ◽

pp. 140-147

Author(s):

Dastan Maulud ◽

Adnan M. Abdulazeez

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Linear Relationship ◽

Multiple Regression ◽

Polynomial Regression ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Explanatory Variables ◽

Almost All ◽

Simple Regression

Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.

Download Full-text

Test data reuse for evaluation of adaptive machine learning algorithms: over-fitting to a fixed 'test' dataset and a potential solution

Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment ◽

10.1117/12.2293818 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alexej Gossmann ◽

Aria Pezeshk ◽

Berkman Sahiner

Keyword(s):

Machine Learning ◽

Test Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Reuse ◽

Potential Solution ◽

Test Dataset

Download Full-text

Automatic solution for solar cell photo-current prediction using machine learning

E3S Web of Conferences ◽

10.1051/e3sconf/202129701029 ◽

2021 ◽

Vol 297 ◽

pp. 01029

Author(s):

Mohammed Azza ◽

Jabran Daaif ◽

Adnane Aouidate ◽

El Hadi Chahid ◽

Said Belaaouad

Keyword(s):

Machine Learning ◽

Solar Cell ◽

Linear Regression ◽

Prediction Model ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Prediction Methods ◽

Lasso Regression ◽

Photo Current

In this paper, we discuss the prediction of future solar cell photo-current generated by the machine learning algorithm. For the selection of prediction methods, we compared and explored different prediction methods. Precision, MSE and MAE were used as models due to its adaptable and probabilistic methodology on model selection. This study uses machine learning algorithms as a research method that develops models for predicting solar cell photo-current. We create an electric current prediction model. In view of the models of machine learning algorithms for example, linear regression, Lasso regression, K Nearest Neighbors, decision tree and random forest, watch their order precision execution. In this point, we recommend a solar cell photocurrent prediction model for better information based on resistance assessment. These reviews show that the linear regression algorithm, given the precision, reliably outperforms alternative models in performing the solar cell photo-current prediction Iph

Download Full-text

Machine learning improves the prediction of febrile neutropenia in Korean inpatients undergoing chemotherapy for breast cancer

Scientific Reports ◽

10.1038/s41598-020-71927-6 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Bum-Joo Cho ◽

Kyoung Min Kim ◽

Sanchir-Erdene Bilegsaikhan ◽

Yong Joon Suh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Risk Factors ◽

Febrile Neutropenia ◽

Prediction Models ◽

Learning Algorithms ◽

Area Under The Curve ◽

Primary Prophylaxis ◽

Machine Learning Algorithms ◽

Significant Difference

Abstract Febrile neutropenia (FN) is one of the most concerning complications of chemotherapy, and its prediction remains difficult. This study aimed to reveal the risk factors for and build the prediction models of FN using machine learning algorithms. Medical records of hospitalized patients who underwent chemotherapy after surgery for breast cancer between May 2002 and September 2018 were selectively reviewed for development of models. Demographic, clinical, pathological, and therapeutic data were analyzed to identify risk factors for FN. Using machine learning algorithms, prediction models were developed and evaluated for performance. Of 933 selected inpatients with a mean age of 51.8 ± 10.7 years, FN developed in 409 (43.8%) patients. There was a significant difference in FN incidence according to age, staging, taxane-based regimen, and blood count 5 days after chemotherapy. The area under the curve (AUC) built based on these findings was 0.870 on the basis of logistic regression. The AUC improved by machine learning was 0.908. Machine learning improves the prediction of FN in patients undergoing chemotherapy for breast cancer compared to the conventional statistical model. In these high-risk patients, primary prophylaxis with granulocyte colony-stimulating factor could be considered.

Download Full-text

A Plastic Contamination Image Dataset for Deep Learning Model Development and Training

AgriEngineering ◽

10.3390/agriengineering2020021 ◽

2020 ◽

Vol 2 (2) ◽

pp. 317-321

Author(s):

Mathew G. Pelletier ◽

Greg A. Holt ◽

John D. Wanjura

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Detection System ◽

Low Cost ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Image Dataset ◽

The U.S ◽

Image Datasets ◽

Color Cameras

The removal of plastic contamination in cotton lint is an issue of top priority for the U.S. cotton industry. One of the main sources of plastic contamination appearing in marketable cotton bales is plastic used to wrap cotton modules on cotton harvesters. To help mitigate plastic contamination at the gin, automatic inspection systems are needed to detect and control removal systems. Due to significant cost constraints in the U.S. cotton ginning industry, the use of low-cost color cameras for detection of plastic contamination has been successfully adopted. However, some plastics of similar color to background are difficult to detect when utilizing traditional machine learning algorithms. Hence, current detection/removal system designs are not able to remove all plastics and there is still a need for better detection methods. Recent advances in deep learning convolutional neural networks (CNNs) show promise for enabling the use of low-cost color cameras for detection of objects of interest when placed against a background of similar color. They do this by mimicking the human visual detection system, focusing on differences in texture rather than color as the primary detection paradigm. The key to leveraging the CNNs is the development of extensive image datasets required for training. One of the impediments to this methodology is the need for large image datasets where each image must be annotated with bounding boxes that surround each object of interest. As this requirement is labor-intensive, there is significant value in these image datasets. This report details the included image dataset as well as the system design used to collect the images. For acquisition of the image dataset, a prototype detection system was developed and deployed into a commercial cotton gin where images were collected for the duration of the 2018–2019 ginning season. A discussion of the observational impact that the system had on reduction of plastic contamination at the commercial gin, utilizing traditional color-based machine learning algorithms, is also included.

Download Full-text

Using Big Data-machine learning models for diabetes prediction and flight delays analytics

Journal Of Big Data ◽

10.1186/s40537-020-00355-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Thérence Nibareke ◽

Jalal Laassiri

Keyword(s):

Machine Learning ◽

Big Data ◽

Linear Regression ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Learning Models ◽

Flight Delays ◽

Machine Learning Models

Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).

Download Full-text

Design and Evaluation of a New Machine Learning Framework for IoT and Embedded Devices

Electronics ◽

10.3390/electronics10050600 ◽

2021 ◽

Vol 10 (5) ◽

pp. 600

Author(s):

Gianluca Cornetta ◽

Abdellah Touhafi

Keyword(s):

Machine Learning ◽

Data Analysis ◽

High Performance ◽

Low Cost ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Raspberry Pi ◽

Embedded Devices ◽

Iot Devices

Low-cost, high-performance embedded devices are proliferating and a plethora of new platforms are available on the market. Some of them either have embedded GPUs or the possibility to be connected to external Machine Learning (ML) algorithm hardware accelerators. These enhanced hardware features enable new applications in which AI-powered smart objects can effectively and pervasively run in real-time distributed ML algorithms, shifting part of the raw data analysis and processing from cloud or edge to the device itself. In such context, Artificial Intelligence (AI) can be considered as the backbone of the next generation of Internet of the Things (IoT) devices, which will no longer merely be data collectors and forwarders, but really “smart” devices with built-in data wrangling and data analysis features that leverage lightweight machine learning algorithms to make autonomous decisions on the field. This work thoroughly reviews and analyses the most popular ML algorithms, with particular emphasis on those that are more suitable to run on resource-constrained embedded devices. In addition, several machine learning algorithms have been built on top of a custom multi-dimensional array library. The designed framework has been evaluated and its performance stressed on Raspberry Pi III- and IV-embedded computers.

Download Full-text

Comparative Analysis of Supervised Machine Learning Algorithms to Build a Predictive Model for Evaluating Students’ Performance

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i02.20025 ◽

2021 ◽

Vol 17 (02) ◽

pp. 90

Author(s):

Inssaf El Guabassi ◽

Zakaria Bousalem ◽

Rim Marah ◽

Aimad Qazdar

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Predictive Model ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

The Future ◽

Log Linear

In recent years, the world's population is increasingly demanding to predict the future with certainty, predicting the right information in any area is becoming a necessity. One of the ways to predict the future with certainty is to determine the possible future. In this sense, machine learning is a way to analyze huge datasets to make strong predictions or decisions. The main objective of this research work is to build a predictive model for evaluating students’ performance. Hence, the contributions are threefold. The first is to apply several supervised machine learning algorithms (i.e. ANCOVA, Logistic Regression, Support Vector Regression, Log-linear Regression, Decision Tree Regression, Random Forest Regression, and Partial Least Squares Regression) on our education dataset. The second purpose is to compare and evaluate algorithms used to create a predictive model based on various evaluation metrics. The last purpose is to determine the most important factors that influence the success or failure of the students. The experimental results showed that the Log-linear Regression provides a better prediction as well as the behavioral factors that influence students’ performance.

Download Full-text