A Machine Learning Approach to Identify Houses with High Lead Tap Water Concentrations

Seyedsaeed Hajiseyedjavadi; Michael Blackhurst; Hassan A Karimi

doi:10.1609/aaai.v34i08.7040

A Machine Learning Approach to Identify Houses with High Lead Tap Water Concentrations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i08.7040 ◽

2020 ◽

Vol 34 (08) ◽

pp. 13300-13305 ◽

Cited By ~ 1

Author(s):

Seyedsaeed Hajiseyedjavadi ◽

Michael Blackhurst ◽

Hassan A Karimi

Keyword(s):

Machine Learning ◽

Drinking Water ◽

Tap Water ◽

Training Data ◽

Spatial And Temporal Patterns ◽

Regulatory Requirements ◽

Protection Measures ◽

Scarce Resources ◽

Machine Learning Approach ◽

Property Tax Assessment

Over a century separates initial lead service lateral installations from the federal regulation of lead in drinking water. As such, municipalities often do not have adequate information describing installations of lead plumbing. Municipalities thus face challenges such as reducing exposure to lead in drinking water, spreading scarce resources for gathering information, adopting short-term protection measures (e.g., providing filters), and developing longer-term prevention strategies (e.g., replacing lead laterals). Given the spatial and temporal patterns to properties, machine learning is seen as a useful tool to reduce uncertainty in decision making by authorities when addressing lead in water. The Pittsburgh Water and Sewer Authority (PWSA) is currently addressing these challenges in Pittsburgh and this paper describes the development and application of a model predicting high tap water concentrations (> 15 ppb) for PWSA customers. The model was developed using spatial cross validation to support PWSA’s interest in applying predictions in areas without training data. The model’s AUROC is 71.6% and primarily relies on publicly available property tax assessment data and indicators of lateral material collected by PWSA as they meet regulatory requirements.

Download Full-text

Brain Activity-Based Metrics for Assessing Learning States in VR under Stress among Firefighters: An Explorative Machine Learning Approach in Neuroergonomics

Brain Sciences ◽

10.3390/brainsci11070885 ◽

2021 ◽

Vol 11 (7) ◽

pp. 885

Author(s):

Maher Abujelala ◽

Rohith Karthikeyan ◽

Oshin Tyagi ◽

Jing Du ◽

Ranjana K. Mehta

Keyword(s):

Machine Learning ◽

Environmental Conditions ◽

Brain Activity ◽

Memory Task ◽

Classification Problem ◽

Brain Regions ◽

Training Data ◽

Information Encoding ◽

Machine Learning Approach ◽

Encoding And Retrieval

The nature of firefighters` duties requires them to work for long periods under unfavorable conditions. To perform their jobs effectively, they are required to endure long hours of extensive, stressful training. Creating such training environments is very expensive and it is difficult to guarantee trainees’ safety. In this study, firefighters are trained in a virtual environment that includes virtual perturbations such as fires, alarms, and smoke. The objective of this paper is to use machine learning methods to discern encoding and retrieval states in firefighters during a visuospatial episodic memory task and explore which regions of the brain provide suitable signals to solve this classification problem. Our results show that the Random Forest algorithm could be used to distinguish between information encoding and retrieval using features extracted from fNIRS data. Our algorithm achieved an F-1 score of 0.844 and an accuracy of 79.10% if the training and testing data are obtained at similar environmental conditions. However, the algorithm’s performance dropped to an F-1 score of 0.723 and accuracy of 60.61% when evaluated on data collected under different environmental conditions than the training data. We also found that if the training and evaluation data were recorded under the same environmental conditions, the RPM, LDLPFC, RDLPFC were the most relevant brain regions under non-stressful, stressful, and a mix of stressful and non-stressful conditions, respectively.

Download Full-text

Reviewing Sentiment Analysis at the Shallow End

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8274 ◽

2020 ◽

Vol 8 (4) ◽

pp. 47-62

Author(s):

Francisca Oladipo ◽

Ogunsanya, F. B ◽

Musa, A. E. ◽

Ogbuju, E. E ◽

Ariwa, E.

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Information Exchange ◽

Training Data ◽

Data Set ◽

The Social ◽

Machine Learning Approach ◽

Media Space ◽

Social Media Platforms

The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms, there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users’ opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training data set, unlike the machine learning approach.

Download Full-text

A Comprehensive Analysis of Deep Neural-Based Cerebral Microbleeds Detection System

Electronics ◽

10.3390/electronics10182208 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2208

Author(s):

Maria Anna Ferlin ◽

Michał Grochowski ◽

Arkadiusz Kwasigroch ◽

Agnieszka Mikołajczyk ◽

Edyta Szurowska ◽

...

Keyword(s):

Machine Learning ◽

Detection System ◽

Three Dimensional ◽

Magnetic Resonance Images ◽

Cerebral Microbleeds ◽

Training Data ◽

Learning Approach ◽

Dimensional Problem ◽

Reliable System ◽

Machine Learning Approach

Machine learning-based systems are gaining interest in the field of medicine, mostly in medical imaging and diagnosis. In this paper, we address the problem of automatic cerebral microbleeds (CMB) detection in magnetic resonance images. It is challenging due to difficulty in distinguishing a true CMB from its mimics, however, if successfully solved, it would streamline the radiologists work. To deal with this complex three-dimensional problem, we propose a machine learning approach based on a 2D Faster RCNN network. We aimed to achieve a reliable system, i.e., with balanced sensitivity and precision. Therefore, we have researched and analysed, among others, impact of the way the training data are provided to the system, their pre-processing, the choice of model and its structure, and also the ways of regularisation. Furthermore, we also carefully analysed the network predictions and proposed an algorithm for its post-processing. The proposed approach enabled for obtaining high precision (89.74%), sensitivity (92.62%), and F1 score (90.84%). The paper presents the main challenges connected with automatic cerebral microbleeds detection, its deep analysis and developed system. The conducted research may significantly contribute to automatic medical diagnosis.

Download Full-text

A machine learning approach to define antimalarial drug action from heterogeneous cell-based screens

Science Advances ◽

10.1126/sciadv.aba9338 ◽

2020 ◽

Vol 6 (39) ◽

pp. eaba9338 ◽

Cited By ~ 1

Author(s):

George W. Ashdown ◽

Michelle Dimon ◽

Minjie Fan ◽

Fernando Sánchez-Román Terán ◽

Kathrin Witmer ◽

...

Keyword(s):

Machine Learning ◽

Mechanism Of Action ◽

Training Data ◽

Supervised Machine Learning ◽

Cross Resistance ◽

Learning Approach ◽

Imaging Data ◽

Drug Induced ◽

Effective Prevention ◽

Machine Learning Approach

Drug resistance threatens the effective prevention and treatment of an ever-increasing range of human infections. This highlights an urgent need for new and improved drugs with novel mechanisms of action to avoid cross-resistance. Current cell-based drug screens are, however, restricted to binary live/dead readouts with no provision for mechanism of action prediction. Machine learning methods are increasingly being used to improve information extraction from imaging data. These methods, however, work poorly with heterogeneous cellular phenotypes and generally require time-consuming human-led training. We have developed a semi-supervised machine learning approach, combining human- and machine-labeled training data from mixed human malaria parasite cultures. Designed for high-throughput and high-resolution screening, our semi-supervised approach is robust to natural parasite morphological heterogeneity and correctly orders parasite developmental stages. Our approach also reproducibly detects and clusters drug-induced morphological outliers by mechanism of action, demonstrating the potential power of machine learning for accelerating cell-based drug discovery.

Download Full-text

Data linearity using Kernel PCA with Performance Evaluation of Random Forest for training data: A machine learning approach

2016 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci.2016.7479924 ◽

2016 ◽

Author(s):

Vinai George Biju ◽

Prashant C M

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Random Forest ◽

Training Data ◽

Learning Approach ◽

Kernel Pca ◽

Machine Learning Approach

Download Full-text

Doppler Spread Estimation Based on Machine Learning for an OFDM System

Wireless Communications and Mobile Computing ◽

10.1155/2021/5586029 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Eunchul Yoon ◽

Soonbum Kwon ◽

Unil Yun ◽

Sun-Yong Kim

Keyword(s):

Machine Learning ◽

Network Architecture ◽

Training Data ◽

Doppler Spread ◽

Learning Approach ◽

Ofdm System ◽

Angle Of Arrival ◽

Estimation Errors ◽

Machine Learning Approach ◽

Spread Estimation

In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the channel state information (CSI) collected under the mixed-channel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using the averaged PSD sequence as training data outperforms the other machine learning approach using the channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches.

Download Full-text

HAMLET

Terminology ◽

10.1075/term.20017.rig ◽

2021 ◽

Author(s):

Ayla Rigouts Terryn ◽

Véronique Hoste ◽

Els Lefever

Keyword(s):

Machine Learning ◽

Language Processing ◽

Hybrid Approach ◽

Substantial Effect ◽

Training Data ◽

Supervised Machine Learning ◽

Learning Approach ◽

Term Extraction ◽

Machine Learning Approach ◽

Different Types

Abstract Automatic term extraction (ATE) is an important task within natural language processing, both separately, and as a preprocessing step for other tasks. In recent years, research has moved far beyond the traditional hybrid approach where candidate terms are extracted based on part-of-speech patterns and filtered and sorted with statistical termhood and unithood measures. While there has been an explosion of different types of features and algorithms, including machine learning methodologies, some of the fundamental problems remain unsolved, such as the ambiguous nature of the concept “term”. This has been a hurdle in the creation of data for ATE, meaning that datasets for both training and testing are scarce, and system evaluations are often limited and rarely cover multiple languages and domains. The ACTER Annotated Corpora for Term Extraction Research contain manual term annotations in four domains and three languages and have been used to investigate a supervised machine learning approach for ATE, using a binary random forest classifier with multiple types of features. The resulting system (HAMLET Hybrid Adaptable Machine Learning approach to Extract Terminology) provides detailed insights into its strengths and weaknesses. It highlights a certain unpredictability as an important drawback of machine learning methodologies, but also shows how the system appears to have learnt a robust definition of terms, producing results that are state-of-the-art, and contain few errors that are not (part of) terms in any way. Both the amount and the relevance of the training data have a substantial effect on results, and by varying the training data, it appears to be possible to adapt the system to various desired outputs, e.g., different types of terms. While certain issues remain difficult – such as the extraction of rare terms and multiword terms – this study shows how supervised machine learning is a promising methodology for ATE.

Download Full-text

Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope

International Journal of Molecular Sciences ◽

10.3390/ijms21103585 ◽

2020 ◽

Vol 21 (10) ◽

pp. 3585 ◽

Cited By ~ 3

Author(s):

Neann Mathai ◽

Johannes Kirchmair

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Target Prediction ◽

Training Data ◽

Target Space ◽

Learning Approach ◽

Learning Approaches ◽

Individual Test ◽

Machine Learning Approach ◽

The Individual

Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.

Download Full-text

Prediction of galaxy halo masses in SDSS DR7 via a machine learning approach

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2775 ◽

2019 ◽

Vol 490 (2) ◽

pp. 2367-2379 ◽

Cited By ~ 5

Author(s):

Victor F Calderon ◽

Andreas A Berlind

Keyword(s):

Machine Learning ◽

Dark Matter ◽

Sloan Digital Sky Survey ◽

Training Data ◽

Dark Matter Halo ◽

Joint Distributions ◽

Sky Survey ◽

Machine Learning Approach ◽

Improved Performance ◽

Halo Masses

ABSTRACT We present a machine learning (ML) approach for the prediction of galaxies’ dark matter halo masses which achieves an improved performance over conventional methods. We train three ML algorithms (XGBoost, random forests, and neural network) to predict halo masses using a set of synthetic galaxy catalogues that are built by populating dark matter haloes in N-body simulations with galaxies and that match both the clustering and the joint distributions of properties of galaxies in the Sloan Digital Sky Survey (SDSS). We explore the correlation of different galaxy- and group-related properties with halo mass, and extract the set of nine features that contribute the most to the prediction of halo mass. We find that mass predictions from the ML algorithms are more accurate than those from halo abundance matching (HAM) or dynamical mass estimates (DYN). Since the danger of this approach is that our training data might not accurately represent the real Universe, we explore the effect of testing the model on synthetic catalogues built with different assumptions than the ones used in the training phase. We test a variety of models with different ways of populating dark matter haloes, such as adding velocity bias for satellite galaxies. We determine that, though training and testing on different data can lead to systematic errors in predicted masses, the ML approach still yields substantially better masses than either HAM or DYN. Finally, we apply the trained model to a galaxy and group catalogue from the SDSS DR7 and present the resulting halo masses.

Download Full-text

Automated Angiographic Labeling Pipeline

Proceedings of IMPRS ◽

10.18060/25890 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Jacob Cantrell ◽

Kolten Kersey ◽

Anush Motaganahalli ◽

Amy Li ◽

Hunter Maxwell ◽

...

Keyword(s):

Machine Learning ◽

Active Learning ◽

Learning Algorithm ◽

Axial Flow ◽

Training Data ◽

Convenience Sample ◽

Hybrid Approaches ◽

Machine Learning Approach ◽

Picture Archiving And Communication ◽

Artery Disease

Background & Hypothesis: Treatment decisions for medical management, endovascular therapy, open surgery, and hybrid approaches for peripheral artery disease (PAD) are largely driven by imaging. While catheter-directed angiography remains the gold-standard for endoluminal vessel analysis, currently, there is not widespread clinical use of machine learning to provide automated segmentation. This project aims to develop an active learning pipeline to automate the labeling of vascular structures in angiographic images. Methods: We queried the picture archiving and communication system (PACS) database for Indiana University Health and Eskenazi Health to identify studies with catheter-directed angiograms of the extremities. From this dataset we randomly selected an initial convenience sample of 50 angiograms to manually label using the 3D Slicer software. We compared three workflow approaches for labeling this training data - (1) human-only single-pass labelling whereby one person labels each image; (2) human-only multi-pass labelling whereby three humans label a vessel with increasing precision; (3) “human-in-the-middle” approach using NVIDIA’s AI-Assisted Annotation client whereby the image is auto-segmented and then manually checked for accuracy. Results: We are currently evaluating speed and accuracy for each of these approaches. However, our preliminary data suggests that human-only multi-pass labeling is most efficientappreac We will be validating the following three-step process. First, thresholding tool was used to leverage differences in contrast gradations to approximate the location of vascular structure. Second, the eraser tool was utilized to refine the vessel boundaries. Finally, major blood vessels contributing to axial flow to the foot were manually labeled. These labeled angiograms will be used to develop an active learning algorithm to automate future labeling of the remaining dataset. Conclusion: A machine learning approach to interpreting lower extremity images can dramatically improve the efficiency of triaging patients with PAD. Further work is underway to develop and implement this program clinically.

Download Full-text