High Performance Machine Learning Models of Large Scale Air Pollution Data in Urban Area

AbstractPreserving the air quality in urban areas is crucial for the health of the population as well as for the environment. The availability of large volumes of measurement data on the concentrations of air pollutants enables their analysis and modelling to establish trends and dependencies in order to forecast and prevent future pollution. This study proposes a new approach for modelling air pollutants data using the powerful machine learning method Random Forest (RF) and Auto-Regressive Integrated Moving Average (ARIMA) methodology. Initially, a RF model of the pollutant is built and analysed in relation to the meteorological variables. This model is then corrected through subsequent modelling of its residuals using the univariate ARIMA. The approach is demonstrated for hourly data on seven air pollutants (O3, NOx, NO, NO2, CO, SO2, PM10) in the town of Dimitrovgrad, Bulgaria over 9 years and 3 months. Six meteorological and three time variables are used as predictors. High-performance models are obtained explaining the data with R2 = 90%-98%.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

High-performance Machine Learning in Enabling Large-scale Load Analysis Considering Class Imbalance and Frequency Domain Characteristics

2020 IEEE Sustainable Power and Energy Conference (iSPEC) ◽

10.1109/ispec50848.2020.9350922 ◽

2020 ◽

Author(s):

Xi Wang ◽

Quan Tang ◽

Haiyan Wang ◽

Ruiguang Ma ◽

Zizhuo Tang

Keyword(s):

Machine Learning ◽

Frequency Domain ◽

High Performance ◽

Large Scale ◽

Class Imbalance ◽

Load Analysis

Download Full-text

Machine learning of serum metabolic patterns encodes early-stage lung adenocarcinoma

10.21203/rs.3.pex-963/v1 ◽

2021 ◽

Author(s):

Lin Huang ◽

Kun Qian

Keyword(s):

Machine Learning ◽

Lung Adenocarcinoma ◽

Cancer Detection ◽

High Performance ◽

Large Scale ◽

Early Cancer ◽

Early Stage ◽

Early Cancer Detection ◽

Ionization Mass ◽

Efficient Test

Abstract Early cancer detection greatly increases the chances for successful treatment, but available diagnostics for some tumours, including lung adenocarcinoma (LA), are limited. An ideal early-stage diagnosis of LA for large-scale clinical use must address quick detection, low invasiveness, and high performance. Here, we conduct machine learning of serum metabolic patterns to detect early-stage LA. We extract direct metabolic patterns by the optimized ferric particle-assisted laser desorption/ionization mass spectrometry within 1 second using only 50 nL of serum. We define a metabolic range of 100-400 Da with 143 m/z features. We diagnose early-stage LA with sensitivity~70-90% and specificity~90-93% through the sparse regression machine learning of patterns. We identify a biomarker panel of seven metabolites and relevant pathways to distinguish early-stage LA from controls (p < 0.05). Our approach advances the design of metabolic analysis for early cancer detection and holds promise as an efficient test for low-cost rollout to clinics.

Download Full-text

Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms

Journal of Computational Science ◽

10.1016/j.jocs.2015.09.008 ◽

2015 ◽

Vol 11 ◽

pp. 69-81 ◽

Cited By ~ 32

Author(s):

Emad Elsebakhi ◽

Frank Lee ◽

Eric Schendel ◽

Anwar Haque ◽

Nagarajan Kathireason ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Functional Networks ◽

Computing Platforms ◽

Performance Computing

Download Full-text

A Survey On Air Quality Prediction Using Traditional Statistics Method

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2063197 ◽

2020 ◽

pp. 942-946

Author(s):

S. Karthikeyani ◽

S. Rathi

Keyword(s):

Air Quality ◽

Air Pollutants ◽

Meteorological Parameters ◽

Moving Average ◽

Quality Prediction ◽

Atmospheric Air ◽

Air Quality Prediction ◽

Naive Method ◽

Auto Regressive ◽

Weighted Moving Average

Air pollution is the release of pollutants into the atmospheric air which are harmful to human health and the planet as a whole. Car emissions, dust, pollen, chemicals from factories and mold spores may be suspended as a particle. In this survey, the analyzes are made revolving on air quality prediction using the traditional statistics method. The prediction using air pollutants are PM2.5, PM10, NO2, NOx, NO, SO2, CO, O3 and meteorological parameters such as Absolute Temparathure(AT) and Relative Humidity(RH). In this comparison experiments, common predicted algorithms are Naive Method, Auto-Regressive Integrated Moving Average(ARIMA), Exponentially Weighted Moving Average(EWMA), Linear Regression(LR), LSTM model, Prophet Model are analyzed.

Download Full-text

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

10.31219/osf.io/j4sbc ◽

2019 ◽

Cited By ~ 2

Author(s):

Manoj Kumar ◽

Cameron Thomas Ellis ◽

Qihong Lu ◽

Hejia Zhang ◽

Mihai Capota ◽

...

Keyword(s):

Machine Learning ◽

Functional Connectivity ◽

Open Source ◽

Programming Languages ◽

High Performance ◽

Large Scale ◽

Markov Models ◽

Matrix Analysis ◽

Fmri Analysis ◽

User Friendly

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

Download Full-text

A Machine Learning Approach to Investigate the Surface Ozone Behavior

Atmosphere ◽

10.3390/atmos11111173 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1173

Author(s):

Roberta Valentina Gagliardi ◽

Claudio Andenna

Keyword(s):

Machine Learning ◽

Air Pollutants ◽

Surface Ozone ◽

Ground Level ◽

Driving Factors ◽

Boosted Regression Trees ◽

Linear Functions ◽

Hourly Data ◽

Machine Learning Approach ◽

Mlr Model

The concentration of surface ozone (O3) strongly depends on environmental and meteorological variables through a series of complex and non-linear functions. This study aims to explore the performances of an advanced machine learning (ML) method, the boosted regression trees (BRT) technique, in exploring the relationships between surface O3 and its driving factors, and in predicting the levels of O3 concentrations. To this end, a BRT model was trained on hourly data of air pollutants and meteorological parameters, acquired, over the 2016–2018 period, in a rural area affected by an anthropic source of air pollutants. The abilities of the BRT model in ranking, visualizing, and predicting the relationship between ground-level O3 concentrations and its driving factors were analyzed and illustrated. A comparison with a multiple linear regression (MLR) model was performed based on several statistical indicators. The results obtained indicated that the BRT model was able to account for 81% of changes in O3 concentrations; it slightly outperforms the MLR model in terms of the predictions accuracy and allows a better identification of the main factors influencing O3 variability on a local scale. This knowledge is expected to be useful in defining effective measures to prevent and/or mitigate the health damages associated with O3 exposure.

Download Full-text

Multi-Horizon Air Pollution Forecasting with Deep Neural Networks

Sensors ◽

10.3390/s21041235 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1235

Author(s):

Mirche Arsov ◽

Eftim Zdravevski ◽

Petre Lameski ◽

Roberto Corizzo ◽

Nikola Koteli ◽

...

Keyword(s):

Air Pollution ◽

Urban Areas ◽

Short Term Memory ◽

Moving Average ◽

Arima Model ◽

Measurement Data ◽

Quality Measurement ◽

Capital City ◽

Industrial Plants ◽

Proactive Measures

Air pollution is a global problem, especially in urban areas where the population density is very high due to the diverse pollutant sources such as vehicles, industrial plants, buildings, and waste. North Macedonia, as a developing country, has a serious problem with air pollution. The problem is highly present in its capital city, Skopje, where air pollution places it consistently within the top 10 cities in the world during the winter months. In this work, we propose using Recurrent Neural Network (RNN) models with long short-term memory units to predict the level of PM10 particles at 6, 12, and 24 h in the future. We employ historical air quality measurement data from sensors placed at multiple locations in Skopje and meteorological conditions such as temperature and humidity. We compare different deep learning models’ performance to an Auto-regressive Integrated Moving Average (ARIMA) model. The obtained results show that the proposed models consistently outperform the baseline model and can be successfully employed for air pollution prediction. Ultimately, we demonstrate that these models can help decision-makers and local authorities better manage the air pollution consequences by taking proactive measures.

Download Full-text

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

ACM Transactions on Mathematical Software ◽

10.1145/3431921 ◽

2021 ◽

Vol 47 (3) ◽

pp. 1-23

Author(s):

Ahmad Abdelfattah ◽

Timothy Costa ◽

Jack Dongarra ◽

Mark Gates ◽

Azzam Haidar ◽

...

Keyword(s):

Machine Learning ◽

Linear Algebra ◽

High Performance ◽

Large Scale ◽

Floating Point ◽

Equal Size ◽

Hardware Accelerators ◽

Double Precision ◽

Basic Linear Algebra Subprograms ◽

Many Core

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.

Download Full-text

Flashover Prevention System using IoT and Machine Learning for Transmission and Distribution Lines

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v15i11.20753 ◽

2021 ◽

Vol 15 (11) ◽

pp. 34

Author(s):

Kobkiat Saraubon ◽

Nuttapong Wiriyanuruknakon ◽

Natdanai Tangthirasunun

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Polynomial Regression ◽

Moving Average ◽

Thin Layers ◽

Prevention System ◽

Auto Regressive ◽

Distribution Line ◽

Long Short Term Memory ◽

Transmission And Distribution

Flashover on transmission and distribution line insulators occurs when the insulator’s resistance drops to a critical level and causes frequent power outages. Thin layers of dust, salt, and airborne particles, gradually deposited on the surface of insulators, as well as humidity, form an electrolyte which causes flashover. In this paper, a flashover prevention system using IoT technology and machine learning is proposed in order to reduce loss and increase power reliability. The system includes an IoT module, a service and clients. The IoT module prototype was installed at a distribution line pole located in Pracha-utit, Bangkok, Thailand and had collected data for thirty-four months. The data were pre-processed and split for the training process and evaluation. In this study, we built and compared four models including linear regression, polynomial regression, Auto-regressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) models. The results revealed that the LSTM model outperformed (<em>R</em><sup>2</sup>=.931, RMSE= 530.74) the others.

Download Full-text