Comparison of machine learning algorithms for concentration detection and prediction of formaldehyde based on electronic nose

Purpose Sensor arrays and pattern recognition-based electronic nose (E-nose) is a typical detection and recognition instrument for indoor air quality (IAQ). The E-nose is able to monitor several pollutants in the air by mimicking the human olfactory system. Formaldehyde concentration prediction is one of the major functionalities of the E-nose, and three typical machine learning (ML) algorithms are most frequently used, including back propagation (BP) neural network, radial basis function (RBF) neural network and support vector regression (SVR). Design/methodology/approach This paper comparatively evaluates and analyzes those three ML algorithms under controllable environment, which is built on a marketable sensor arrays E-nose platform. Variable temperature (T), relative humidity (RH) and pollutant concentrations (C) conditions were measured during experiments to support the investigation. Findings Regression models have been built using the above-mentioned three typical algorithms, and in-depth analysis demonstrates that the model of the BP neural network results in a better prediction performance than others. Originality/value Finally, the empirical results prove that ML algorithms, combined with low-cost sensors, can make high-precision contaminant concentration detection indoor.

Download Full-text

How to Guarantee Food Safety via Grain Storage? An Approach to Improve Management Effectiveness by Machine Learning Algorithms

Journal of Biomedical Research & Environmental Sciences ◽

10.37871/jbres1296 ◽

2021 ◽

Vol 2 (8) ◽

pp. 675-684

Author(s):

Jin Wang ◽

Youjun Jiang ◽

Li Li ◽

Chao Yang ◽

Ke Li ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Bp Neural Network ◽

Machine Learning Algorithms ◽

Support Vector ◽

Grain Storage ◽

Management Effectiveness

The purpose of grain storage management is to dynamically analyze the quality change of the reserved grains, adopt scientific and effective management methods to delay the speed of the quality deterioration, and reduce the loss rate during storage. At present, the supervision of the grain quality in the reserve mainly depends on the periodic measurements of the quality of the grains and the milled products. The data obtained by the above approach is accurate and reliable, but the workload is too large while the frequency is high. The obtained conclusions are also limited to the studied area and not applicable to be extended into other scenarios. Therefore, there is an urgent need of a general method that can quickly predict the quality of grains given different species, regions and storage periods based on historical data. In this study, we introduced Back-Propagation (BP) neural network algorithm and support vector machine algorithm into the quality prediction of the reserved grains. We used quality index, temperature and humidity data to build both an intertemporal prediction model and a synchronous prediction model. The results show that the BP neural network based on the storage characters from the first three periods can accurately predict the key storage characters intertemporally. The support vector machine can provide precise predictions of the key storage characters synchronously. The average predictive error for each of wheat, rice and corn is less than 15%, while the one for soybean is about 20%, all of which can meet the practical demands. In conclusion, the machine learning algorithms are helpful to improve the management effectiveness of grain storage.

Download Full-text

Productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data

Engineering Construction & Architectural Management ◽

10.1108/ecam-05-2020-0357 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiake Fu ◽

Huijing Tian ◽

Lingguang Song ◽

Mingchao Li ◽

Shuo Bai ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Big Data ◽

Bp Neural Network ◽

Coefficient Of Determination ◽

Support Vector ◽

Content Type ◽

Optimal Coefficient ◽

Productivity Estimation

PurposeThis paper presents a new approach of productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data.Design/methodology/approachThe paper used big data, data mining and machine learning techniques to extract features of cutter suction dredgers (CSD) for predicting its productivity. ElasticNet-SVR (Elastic Net-Support Vector Machine) method is used to filter the original monitoring data. Along with the actual working conditions of CSD, 15 features were selected. Then, a box plot was used to clean the corresponding data by filtering out outliers. Finally, four algorithms, namely SVR (Support Vector Regression), XGBoost (Extreme Gradient Boosting), LSTM (Long-Short Term Memory Network) and BP (Back Propagation) Neural Network, were used for modeling and testing.FindingsThe paper provided a comprehensive forecasting framework for productivity estimation including feature selection, data processing and model evaluation. The optimal coefficient of determination (R2) of four algorithms were all above 80.0%, indicating that the features selected were representative. Finally, the BP neural network model coupled with the SVR model was selected as the final model.Originality/valueMachine-learning algorithm incorporating domain expert judgments was used to select predictive features. The final optimal coefficient of determination (R2) of the coupled model of BP neural network and SVR is 87.6%, indicating that the method proposed in this paper is effective for CSD productivity estimation.

Download Full-text

Patient visit forecasting in an emergency department using a deep neural network approach

Kybernetes ◽

10.1108/k-10-2018-0520 ◽

2019 ◽

Vol 49 (9) ◽

pp. 2335-2348 ◽

Cited By ~ 4

Author(s):

Milad Yousefi ◽

Moslem Yousefi ◽

Masood Fathi ◽

Flavio S. Fogliatto

Keyword(s):

Neural Network ◽

Emergency Department ◽

Linear Regression ◽

Deep Neural Network ◽

Short Term Memory ◽

Demand Forecasting ◽

Machine Learning Algorithms ◽

Support Vector ◽

Neural Network Approach ◽

Content Type

Purpose This study aims to investigate the factors affecting daily demand in an emergency department (ED) and to provide a forecasting tool in a public hospital for horizons of up to seven days. Design/methodology/approach In this study, first, the important factors to influence the demand in EDs were extracted from literature then the relevant factors to the study are selected. Then, a deep neural network is applied to constructing a reliable predictor. Findings Although many statistical approaches have been proposed for tackling this issue, better forecasts are viable by using the abilities of machine learning algorithms. Results indicate that the proposed approach outperforms statistical alternatives available in the literature such as multiple linear regression, autoregressive integrated moving average, support vector regression, generalized linear models, generalized estimating equations, seasonal ARIMA and combined ARIMA and linear regression. Research limitations/implications The authors applied this study in a single ED to forecast patient visits. Applying the same method in different EDs may give a better understanding of the performance of the model to the authors. The same approach can be applied in any other demand forecasting after some minor modifications. Originality/value To the best of the knowledge, this is the first study to propose the use of long short-term memory for constructing a predictor of the number of patient visits in EDs.

Download Full-text

Classification of Children’s Sitting Postures Using Machine Learning Algorithms

Applied Sciences ◽

10.3390/app8081280 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1280 ◽

Cited By ~ 14

Author(s):

Yong Kim ◽

Youngdoo Son ◽

Wonjoon Kim ◽

Byungki Jin ◽

Myung Yun

Keyword(s):

Neural Network ◽

Machine Learning ◽

Monitoring System ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Feedback System ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Future Research ◽

Support Vector

Sitting on a chair in an awkward posture or sitting for a long period of time is a risk factor for musculoskeletal disorders. A postural habit that has been formed cannot be changed easily. It is important to form a proper postural habit from childhood as the lumbar disease during childhood caused by their improper posture is most likely to recur. Thus, there is a need for a monitoring system that classifies children’s sitting postures. The purpose of this paper is to develop a system for classifying sitting postures for children using machine learning algorithms. The convolutional neural network (CNN) algorithm was used in addition to the conventional algorithms: Naïve Bayes classifier (NB), decision tree (DT), neural network (NN), multinomial logistic regression (MLR), and support vector machine (SVM). To collect data for classifying sitting postures, a sensing cushion was developed by mounting a pressure sensor mat (8 × 8) inside children’s chair seat cushion. Ten children participated, and sensor data was collected by taking a static posture for the five prescribed postures. The accuracy of CNN was found to be the highest as compared with those of the other algorithms. It is expected that the comprehensive posture monitoring system would be established through future research on enhancing the classification algorithm and providing an effective feedback system.

Download Full-text

Fast and noninvasive electronic nose for sniffing out COVID-19 based on exhaled breath-print recognition

10.21203/rs.3.rs-750988/v1 ◽

2021 ◽

Author(s):

Dian Kesumapramudya Nurputra ◽

Ahmad Kusumaatmadja ◽

Mohamad Saifudin Hakim ◽

Shidiq Nur Hidayat ◽

Trisna Julian ◽

...

Keyword(s):

Machine Learning ◽

Electronic Nose ◽

Quantitative Polymerase Chain Reaction ◽

Machine Learning Algorithms ◽

Oxide Semiconductor ◽

Support Vector ◽

Detection Accuracy ◽

Exhaled Breath ◽

Laboratory Equipment ◽

Linear Discriminant

Abstract Despite its high accuracy to detect the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the reverse transcription-quantitative polymerase chain reaction (RT-qPCR) approach possesses several limitations (e.g., the lengthy invasive procedure, the reagent availability, and the requirement of specialized laboratory, equipment, and trained staffs). We developed and employed a low-cost, noninvasive method to rapidly sniff out the coronavirus disease 2019 (COVID-19) based on a portable electronic nose (GeNose C19) integrating metal oxide semiconductor gas sensor array, optimized feature extraction, and machine learning models. This approach was evaluated in profiling tests involving a total number of 615 breath samples (i.e., 333 positive and 282 negative COVID-19 confirmed by RT-qPCR) obtained from 83 patients in two hospitals located in the Special Region of Yogyakarta, Indonesia. Four different machine learning algorithms (i.e., linear discriminant analysis (LDA), support vector machine (SVM), stacked multilayer perceptron (MLP), and deep neural network (DNN)) were utilized to identify the top-performing pattern recognition methods and to obtain high system detection accuracy (88–95%), sensitivity (86–94%), specificity (88–95%) levels from the testing datasets. Our results suggest that GeNose C19 can be considered a highly potential breathalyzer for fast COVID-19 screening.

Download Full-text

Crime Data Forecasting Using Machine Learning and Big Data Analytics

Webology ◽

10.14704/web/v18si04/web18284 ◽

2021 ◽

Vol 18 (Special Issue 04) ◽

pp. 591-606

Author(s):

R. Brindha ◽

Dr.M. Thillaikarasi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Geographical Information ◽

Recursive Feature Elimination ◽

Support Vector ◽

Crime Data

Big data analytics (BDA) is a system based method with an aim to recognize and examine different designs, patterns and trends under the big dataset. In this paper, BDA is used to visualize and trends the prediction where exploratory data analysis examines the crime data. “A successive facts and patterns have been taken in following cities of California, Washington and Florida by using statistical analysis and visualization”. The predictive result gives the performance using Keras Prophet Model, LSTM and neural network models followed by prophet model which are the existing methods used to find the crime data under BDA technique. But the crime actions increases day by day which is greater task for the people to overcome the challenging crime activities. Some ignored the essential rate of influential aspects. To overcome these challenging problems of big data, many studies have been developed with limited one or two features. “This paper introduces a big data introduces to analyze the influential aspects about the crime incidents, and examine it on New York City. The proposed structure relates the dynamic machine learning algorithms and geographical information system (GIS) to consider the contiguous reasons of crime data. Recursive feature elimination (RFE) is used to select the optimum characteristic data. Exploitation of gradient boost decision tree (GBDT), logistic regression (LR), support vector machine (SVM) and artificial neural network (ANN) are related to develop the optimum data model. Significant impact features were then reviewed by applying GBDT and GIS”. The experimental results illustrates that GBDT along with GIS model combination can identify the crime ranking with high performance and accuracy compared to existing method.”

Download Full-text

PROUD-MAL: static analysis-based progressive framework for deep unsupervised malware classification of windows portable executable

Complex & Intelligent Systems ◽

10.1007/s40747-021-00560-1 ◽

2021 ◽

Author(s):

Syed Khurram Jah Rizvi ◽

Warda Aslam ◽

Muhammad Shahzad ◽

Shahzad Saleem ◽

Muhammad Moazam Fraz

Keyword(s):

Neural Network ◽

Machine Learning ◽

Static Analysis ◽

Deep Neural Network ◽

Early Stage ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Organizational Network ◽

Depth Analysis ◽

Feature Attention

AbstractEnterprises are striving to remain protected against malware-based cyber-attacks on their infrastructure, facilities, networks and systems. Static analysis is an effective approach to detect the malware, i.e., malicious Portable Executable (PE). It performs an in-depth analysis of PE files without executing, which is highly useful to minimize the risk of malicious PE contaminating the system. Yet, instant detection using static analysis has become very difficult due to the exponential rise in volume and variety of malware. The compelling need of early stage detection of malware-based attacks significantly motivates research inclination towards automated malware detection. The recent machine learning aided malware detection approaches using static analysis are mostly supervised. Supervised malware detection using static analysis requires manual labelling and human feedback; therefore, it is less effective in rapidly evolutionary and dynamic threat space. To this end, we propose a progressive deep unsupervised framework with feature attention block for static analysis-based malware detection (PROUD-MAL). The framework is based on cascading blocks of unsupervised clustering and features attention-based deep neural network. The proposed deep neural network embedded with feature attention block is trained on the pseudo labels. To evaluate the proposed unsupervised framework, we collected a real-time malware dataset by deploying low and high interaction honeypots on an enterprise organizational network. Moreover, endpoint security solution is also deployed on an enterprise organizational network to collect malware samples. After post processing and cleaning, the novel dataset consists of 15,457 PE samples comprising 8775 malicious and 6681 benign ones. The proposed PROUD-MAL framework achieved an accuracy of more than 98.09% with better quantitative performance in standard evaluation parameters on collected dataset and outperformed other conventional machine learning algorithms. The implementation and dataset are available at https://bit.ly/35Sne3a.

Download Full-text

Research on dairy products detection based on machine learning algorithm

MATEC Web of Conferences ◽

10.1051/matecconf/202235503008 ◽

2022 ◽

Vol 355 ◽

pp. 03008

Author(s):

Yang Zhang ◽

Lei Zhang ◽

Yabin Ma ◽

Jinsen Guan ◽

Zhaoxia Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Electronic Nose ◽

Milk Fat ◽

Dairy Products ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In this study, an electronic nose model composed of seven kinds of metal oxide semiconductor sensors was developed to distinguish the milk source (the dairy farm to which milk belongs), estimate the content of milk fat and protein in milk, to identify the authenticity and evaluate the quality of milk. The developed electronic nose is a low-cost and non-destructive testing equipment. (1) For the identification of milk sources, this paper uses the method of combining the electronic nose odor characteristics of milk and the component characteristics to distinguish different milk sources, and uses Principal Component Analysis (PCA) and Linear Discriminant Analysis , LDA) for dimensionality reduction analysis, and finally use three machine learning algorithms such as Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF) to build a milk source (cow farm) Identify the model and evaluate and compare the classification effects. The experimental results prove that the classification effect of the SVM-LDA model based on the electronic nose odor characteristics is better than other single feature models, and the accuracy of the test set reaches 91.5%. The RF-LDA and SVM-LDA models based on the fusion feature of the two have the best effect Set accuracy rate is as high as 96%. (2) The three algorithms, Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost) and Random Forest (RF), are used to construct the electronic nose odor data for milk fat rate and protein rate. The method of estimating the model, the results show that the RF model has the best estimation performance( R2 =0.9399 for milk fat; R2=0.9301for milk protein). And it prove that the method proposed in this study can improve the estimation accuracy of milk fat and protein, which provides a technical basis for predicting the quality of dairy products.

Download Full-text

Demand forecasting at retail stage for selected vegetables: a performance analysis

Journal of Modelling in Management ◽

10.1108/jm2-11-2018-0192 ◽

2019 ◽

Vol 14 (4) ◽

pp. 1042-1063 ◽

Cited By ~ 1

Author(s):

Rahul Priyadarshi ◽

Akash Panigrahi ◽

Srikanta Routroy ◽

Girish Kant Garg

Keyword(s):

Machine Learning ◽

Performance Analysis ◽

Forecast Error ◽

Demand Forecasting ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forecasting Model ◽

Content Type ◽

Forecasting Models

Purpose The purpose of this study is to select the appropriate forecasting model at the retail stage for selected vegetables on the basis of performance analysis. Design/methodology/approach Various forecasting models such as the Box–Jenkins-based auto-regressive integrated moving average model and machine learning-based algorithms such as long short-term memory (LSTM) networks, support vector regression (SVR), random forest regression, gradient boosting regression (GBR) and extreme GBR (XGBoost/XGBR) were proposed and applied (i.e. modeling, training, testing and predicting) at the retail stage for selected vegetables to forecast demand. The performance analysis (i.e. forecasting error analysis) was carried out to select the appropriate forecasting model at the retail stage for selected vegetables. Findings From the obtained results for a case environment, it was observed that the machine learning algorithms, namely LSTM and SVR, produced the better results in comparison with other different demand forecasting models. Research limitations/implications The results obtained from the case environment cannot be generalized. However, it may be used for forecasting of different agriculture produces at the retail stage, capturing their demand environment. Practical implications The implementation of LSTM and SVR for the case situation at the retail stage will reduce the forecast error, daily retail inventory and fresh produce wastage and will increase the daily revenue. Originality/value The demand forecasting model selection for agriculture produce at the retail stage on the basis of performance analysis is a unique study where both traditional and non-traditional models were analyzed and compared.

Download Full-text

Wheat Lodging Detection from UAS Imagery Using Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12111838 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1838 ◽

Cited By ~ 8

Author(s):

Zhao Zhang ◽

Paulo Flores ◽

C. Igathinathane ◽

Dayakar L. Naik ◽

Ravi Kiran ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Standard Deviation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Unmanned Aerial Systems

The current mainstream approach of using manual measurements and visual inspections for crop lodging detection is inefficient, time-consuming, and subjective. An innovative method for wheat lodging detection that can overcome or alleviate these shortcomings would be welcomed. This study proposed a systematic approach for wheat lodging detection in research plots (372 experimental plots), which consisted of using unmanned aerial systems (UAS) for aerial imagery acquisition, manual field evaluation, and machine learning algorithms to detect the occurrence or not of lodging. UAS imagery was collected on three different dates (23 and 30 July 2019, and 8 August 2019) after lodging occurred. Traditional machine learning and deep learning were evaluated and compared in this study in terms of classification accuracy and standard deviation. For traditional machine learning, five types of features (i.e. gray level co-occurrence matrix, local binary pattern, Gabor, intensity, and Hu-moment) were extracted and fed into three traditional machine learning algorithms (i.e., random forest (RF), neural network, and support vector machine) for detecting lodged plots. For the datasets on each imagery collection date, the accuracies of the three algorithms were not significantly different from each other. For any of the three algorithms, accuracies on the first and last date datasets had the lowest and highest values, respectively. Incorporating standard deviation as a measurement of performance robustness, RF was determined as the most satisfactory. Regarding deep learning, three different convolutional neural networks (simple convolutional neural network, VGG-16, and GoogLeNet) were tested. For any of the single date datasets, GoogLeNet consistently had superior performance over the other two methods. Further comparisons between RF and GoogLeNet demonstrated that the detection accuracies of the two methods were not significantly different from each other (p > 0.05); hence, the choice of any of the two would not affect the final detection accuracies. However, considering the fact that the average accuracy of GoogLeNet (93%) was larger than RF (91%), it was recommended to use GoogLeNet for wheat lodging detection. This research demonstrated that UAS RGB imagery, coupled with the GoogLeNet machine learning algorithm, can be a novel, reliable, objective, simple, low-cost, and effective (accuracy > 90%) tool for wheat lodging detection.

Download Full-text