Source allocation of per- and polyfluoroalkyl substances (PFAS) with supervised machine learning: Classification performance and the role of feature selection in an expanded dataset

Chemosphere ◽  
2021 ◽  
Vol 275 ◽  
pp. 130124
Author(s):  
Tohren C.G. Kibbey ◽  
Rafal Jabrzemski ◽  
Denis M. O’Carroll
2021 ◽  
Vol 10 (6) ◽  
pp. 3369-3376
Author(s):  
Saima Afrin ◽  
F. M. Javed Mehedi Shamrat ◽  
Tafsirul Islam Nibir ◽  
Mst. Fahmida Muntasim ◽  
Md. Shakil Moharram ◽  
...  

In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system. 


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1578
Author(s):  
Daniel Szostak ◽  
Adam Włodarczyk ◽  
Krzysztof Walkowiak

Rapid growth of network traffic causes the need for the development of new network technologies. Artificial intelligence provides suitable tools to improve currently used network optimization methods. In this paper, we propose a procedure for network traffic prediction. Based on optical networks’ (and other network technologies) characteristics, we focus on the prediction of fixed bitrate levels called traffic levels. We develop and evaluate two approaches based on different supervised machine learning (ML) methods—classification and regression. We examine four different ML models with various selected features. The tested datasets are based on real traffic patterns provided by the Seattle Internet Exchange Point (SIX). Obtained results are analyzed using a new quality metric, which allows researchers to find the best forecasting algorithm in terms of network resources usage and operational costs. Our research shows that regression provides better results than classification in case of all analyzed datasets. Additionally, the final choice of the most appropriate ML algorithm and model should depend on the network operator expectations.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nasser Assery ◽  
Yuan (Dorothy) Xiaohong ◽  
Qu Xiuli ◽  
Roy Kaushik ◽  
Sultan Almalki

Purpose This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models. Design/methodology/approach First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared. Findings The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets. Originality/value In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.


2021 ◽  
Vol 5 (1) ◽  
pp. 21
Author(s):  
Edgar G. Mendez-Lopez ◽  
Jersson X. Leon-Medina ◽  
Diego A. Tibaduiza

Electronic tongue type sensor arrays are made of different materials with the property of capturing signals independently by each sensor. The signals captured when conducting electrochemical tests often have high dimensionality, which increases when performing the data unfolding process. This unfolding process consists of arranging the data coming from different experiments, sensors, and sample times, thus the obtained information is arranged in a two-dimensional matrix. In this work, a description of a tool for the analysis of electronic tongue signals is developed. This tool is developed in Matlab® App Designer, to process and classify the data from different substances analyzed by an electronic tongue type sensor array. The data processing is carried out through the execution of the following stages: (1) data unfolding, (2) normalization, (3) dimensionality reduction, (4) classification through a supervised machine learning model, and finally (5) a cross-validation procedure to calculate a set of classification performance measures. Some important characteristics of this tool are the possibility to tune the parameters of the dimensionality reduction and classifier algorithms, and also plot the two and three-dimensional scatter plot of the features after reduced the dimensionality. This to see the data separability between classes and compatibility in each class. This interface is successfully tested with two electronic tongue sensor array datasets with multi-frequency large amplitude pulse voltammetry (MLAPV) signals. The developed graphical user interface allows comparing different methods in each of the mentioned stages to find the best combination of methods and thus obtain the highest values of classification performance measures.


2021 ◽  
pp. 177-191
Author(s):  
Natalia V. Revollo ◽  
G. Noelia Revollo Sarmiento ◽  
Claudio Delrieux ◽  
Marcela Herrera ◽  
Rolando González-José

Author(s):  
Michael Vieceli ◽  
Amy Van Dusen ◽  
Karen Drukker ◽  
Hiroyuki Abe ◽  
Maryellen L. Giger ◽  
...  

Atmosphere ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 251 ◽  
Author(s):  
Wael Ghada ◽  
Nicole Estrella ◽  
Annette Menzel

Rain microstructure parameters assessed by disdrometers are commonly used to classify rain into convective and stratiform. However, different types of disdrometer result in different values for these parameters. This in turn potentially deteriorates the quality of rain type classifications. Thies disdrometer measurements at two sites in Bavaria in southern Germany were combined with cloud observations to construct a set of clear convective and stratiform intervals. This reference dataset was used to study the performance of classification methods from the literature based on the rain microstructure. We also explored the possibility of improving the performance of these methods by tuning the decision boundary. We further identified highly discriminant rain microstructure parameters and used these parameters in five machine-learning classification models. Our results confirm the potential of achieving high classification performance by applying the concepts of machine learning compared to already available methods. Machine-learning classification methods provide a concrete and flexible procedure that is applicable regardless of the geographical location or the device. The suggested procedure for classifying rain types is recommended prior to studying rain microstructure variability or any attempts at improving radar estimations of rain intensity.


2020 ◽  
Vol 77 (4) ◽  
pp. 1545-1558
Author(s):  
Michael F. Bergeron ◽  
Sara Landset ◽  
Xianbo Zhou ◽  
Tao Ding ◽  
Taghi M. Khoshgoftaar ◽  
...  

Background: The widespread incidence and prevalence of Alzheimer’s disease and mild cognitive impairment (MCI) has prompted an urgent call for research to validate early detection cognitive screening and assessment. Objective: Our primary research aim was to determine if selected MemTrax performance metrics and relevant demographics and health profile characteristics can be effectively utilized in predictive models developed with machine learning to classify cognitive health (normal versus MCI), as would be indicated by the Montreal Cognitive Assessment (MoCA). Methods: We conducted a cross-sectional study on 259 neurology, memory clinic, and internal medicine adult patients recruited from two hospitals in China. Each patient was given the Chinese-language MoCA and self-administered the continuous recognition MemTrax online episodic memory test on the same day. Predictive classification models were built using machine learning with 10-fold cross validation, and model performance was measured using Area Under the Receiver Operating Characteristic Curve (AUC). Models were built using two MemTrax performance metrics (percent correct, response time), along with the eight common demographic and personal history features. Results: Comparing the learners across selected combinations of MoCA scores and thresholds, Naïve Bayes was generally the top-performing learner with an overall classification performance of 0.9093. Further, among the top three learners, MemTrax-based classification performance overall was superior using just the top-ranked four features (0.9119) compared to using all 10 common features (0.8999). Conclusion: MemTrax performance can be effectively utilized in a machine learning classification predictive model screening application for detecting early stage cognitive impairment.


Sign in / Sign up

Export Citation Format

Share Document