Using Open Source Data for Landing Time Prediction with Machine Learning Methods

Increasing demands on a highly efficient air traffic management system go hand in hand with increasing requirements for predicting the aircraft’s future position. In this context, the airport collaborative decision-making framework provides a standardized approach to improve airport performance by defining operationally important milestones along the aircraft trajectory. In particular, the aircraft landing time is an important milestone, significantly impacting the utilization of limited runway capacities. We compare different machine learning methods to predict the landing time based on broadcast surveillance data of arrival flights at Zurich Airport. Thus, we consider different time horizons (look ahead times) for arrival flights to predict additional sub-milestones for n-hours-out timestamps. The features are extracted from both surveillance data and weather information. Flights are clustered and analyzed using feedforward neural networks and decision tree methods, such as random forests and gradient boosting machines, compared with cross-validation error. The prediction of landing time from entry points with a radius of 45, 100, 150, 200, and 250 nautical miles can attain an MAE and RMSE within 5 min on the test set. As the radius increases, the prediction error will also increase. Our predicted landing times will contribute to appropriate airport performance management.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Short- and Medium-range Prediction of Relativistic Electron Flux in the Earth’s Outer Radiation Belt by Machine Learning Methods

Meteorologiya i Gidrologiya ◽

10.52002/0130-2906-2021-3-47-57 ◽

2021 ◽

Vol 3 ◽

pp. 47-57

Author(s):

I. N. Myagkova ◽

◽

V. R. Shirokii ◽

Yu. S. Shugai ◽

O. G. Barinov ◽

...

Keyword(s):

Machine Learning ◽

Radiation Belt ◽

Gradient Boosting ◽

Relativistic Electrons ◽

Learning Methods ◽

Outer Radiation Belt ◽

Machine Learning Methods ◽

The Earth ◽

Skill Scores ◽

Medium Range

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.

Download Full-text

Machine Learning Methods and Qualimetric Approach to Determine the Conditions for Train Students in the Field of Environmental and Economic Activities

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v16i03.17715 ◽

2021 ◽

Vol 16 (03) ◽

pp. 72

Author(s):

Artem Salamatov ◽

Elena Gafarova ◽

Vladimir Belevitin ◽

Maxim Gafarov ◽

Darya Gordeeva

Keyword(s):

Machine Learning ◽

Economic Activity ◽

Professional Training ◽

Educational Process ◽

Effective Control ◽

Gradient Boosting ◽

Economic Activities ◽

Learning Methods ◽

Machine Learning Methods ◽

Pedagogical Research

The relevance of environmental and economic activity requires professional training of specialists and, accordingly, new organizational and pedagogical conditions for effective education. It is also necessary to develop control and measuring materials that would have all the qualities (validity, reliability, consistency, significance and objectivity) to obtain the most reliable results in justifying the need and sufficiency of the identified conditions. The intensification of information processes in vocational education leads researchers to the need to find optimal conditions and tools to achieve pedagogical goals. Among these tools are machine learning methods and mathematical models built on their basis for quantitative assessment of the quality of vocational training in the field of environmental and economic activities. The use of the qualimetric approach in pedagogy is possible in the presence of a certain array of observational data for one or another criterion related to learning conditions, personal qualities of students, etc. The construction of an algorithmic model allows one to operate with conditions in mental experiments, test hypotheses, and since pedagogical research is quite long in time, the choice of conditions based on the most favorable forecast built using the model allows one to optimize pedagogical resources to achieve the planned results. Rational selection of effective control and measuring materials (CMMs) allows one to determine the need and sufficiency of organizational and pedagogical conditions. While mathematical modeling allows one to quickly adjust the organizational and pedagogical conditions as a set of opportunities for content, forms, teaching methods, information and communication technologies (ICTs) and CMMs used to achieve the planned educational results in the sphere of environmental and economic activity. Interpretation of the derived features in the context of the pedagogical research performed with a cross-validation accuracy of 72% made it possible to reveal the dominant significance of intersubjective connections between the disciplines studied by the sample of students in the bachelor's and master's programs. Namely, programs 44.03.04 and 44.04.04 "Professional training (by industry)", which are the most significant in terms of the formation of competence in the field of environmental and economic activities. The designed mathematical model of the Gradient Boosting Classifier allows making predictive expectations of the studied competency types and testing hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process. A necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity is to ensure continuity between significant disciplines and the actualization of interdisciplinary relationships based on the development of interdisciplinary courses.

Download Full-text

Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-3-26-53-61 ◽

2020 ◽

Vol 26 (3) ◽

pp. 53-61

Author(s):

Pavel Kikin ◽

Alexey Kolesnikov ◽

Alexey Portnov ◽

Denis Grischenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Mathematical Models ◽

Optimal Algorithm ◽

The State ◽

Gradient Boosting ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Spatio Temporal

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem

Download Full-text

Machine Learning to Forecast Medical Attentions of Pneumonia Cases in Colombian Cities: An implementation with Air Quality, Meteorological and Admission Data

10.21203/rs.3.rs-53367/v1 ◽

2020 ◽

Author(s):

Juan David Gutiérrez

Keyword(s):

Public Health ◽

Machine Learning ◽

Air Pollution ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Health Authorities ◽

Admission Data ◽

Extreme Gradient Boosting ◽

Public Health Authorities

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.

Download Full-text

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Remote Sensing ◽

10.3390/rs12060914 ◽

2020 ◽

Vol 12 (6) ◽

pp. 914 ◽

Cited By ~ 4

Author(s):

Mahdieh Danesh Yazdi ◽

Zheng Kuang ◽

Konstantina Dimakopoulou ◽

Benjamin Barratt ◽

Esra Suel ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Meteorological Data ◽

Fine Particulate Matter ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

ML-Based Analysis of Particle Distributions in High-Intensity Laser Experiments: Role of Binning Strategy

Entropy ◽

10.3390/e23010021 ◽

2020 ◽

Vol 23 (1) ◽

pp. 21

Author(s):

Yury Rodimkov ◽

Evgeny Efimenko ◽

Valentin Volokitin ◽

Elena Panova ◽

Alexey Polovinkin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Quantum Electrodynamics ◽

Strong Field ◽

Experimental Studies ◽

Research Area ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

When entering the phase of big data processing and statistical inferences in experimental physics, the efficient use of machine learning methods may require optimal data preprocessing methods and, in particular, optimal balance between details and noise. In experimental studies of strong-field quantum electrodynamics with intense lasers, this balance concerns data binning for the observed distributions of particles and photons. Here we analyze the aspect of binning with respect to different machine learning methods (Support Vector Machine (SVM), Gradient Boosting Trees (GBT), Fully-Connected Neural Network (FCNN), Convolutional Neural Network (CNN)) using numerical simulations that mimic expected properties of upcoming experiments. We see that binning can crucially affect the performance of SVM and GBT, and, to a less extent, FCNN and CNN. This can be interpreted as the latter methods being able to effectively learn the optimal binning, discarding unnecessary information. Nevertheless, given limited training sets, the results indicate that the efficiency can be increased by optimizing the binning scale along with other hyperparameters. We present specific measurements of accuracy that can be useful for planning of experiments in the specified research area.

Download Full-text

Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks

Genes ◽

10.3390/genes11010041 ◽

2019 ◽

Vol 11 (1) ◽

pp. 41 ◽

Cited By ~ 1

Author(s):

Mengli Xiao ◽

Zhong Zhuang ◽

Wei Pan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Test Data ◽

Sequence Data ◽

Gradient Boosting ◽

Genome Wide Association Studies ◽

Learning Methods ◽

Machine Learning Methods ◽

Local Sequence

Enhancer-promoter interactions (EPIs) are crucial for transcriptional regulation. Mapping such interactions proves useful for understanding disease regulations and discovering risk genes in genome-wide association studies. Some previous studies showed that machine learning methods, as computational alternatives to costly experimental approaches, performed well in predicting EPIs from local sequence and/or local epigenomic data. In particular, deep learning methods were demonstrated to outperform traditional machine learning methods, and using DNA sequence data alone could perform either better than or almost as well as only utilizing epigenomic data. However, most, if not all, of these previous studies were based on randomly splitting enhancer-promoter pairs as training, tuning, and test data, which has recently been pointed out to be problematic; due to multiple and duplicating/overlapping enhancers (and promoters) in enhancer-promoter pairs in EPI data, such random splitting does not lead to independent training, tuning, and test data, thus resulting in model over-fitting and over-estimating predictive performance. Here, after correcting this design issue, we extensively studied the performance of various deep learning models with local sequence and epigenomic data around enhancer-promoter pairs. Our results confirmed much lower performance using either sequence or epigenomic data alone, or both, than reported previously. We also demonstrated that local epigenomic features were more informative than local sequence data. Our results were based on an extensive exploration of many convolutional neural network (CNN) and feed-forward neural network (FNN) structures, and of gradient boosting as a representative of traditional machine learning.

Download Full-text

Gradient Boosting–Based Machine Learning Methods in Real Estate Market Forecasting

Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020) ◽

10.2991/aisr.k.201029.039 ◽

2020 ◽

Author(s):

Nikita Fedorov ◽

Yulia Petrichenko

Keyword(s):

Machine Learning ◽

Real Estate ◽

Real Estate Market ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Application of machine learning methods for automated classification and routing in ITIL

Journal of Physics Conference Series ◽

10.1088/1742-6596/2091/1/012041 ◽

2021 ◽

Vol 2091 (1) ◽

pp. 012041

Author(s):

VV Nikulin ◽

S D Shibaikin ◽

A N Vishnyakov

Keyword(s):

Machine Learning ◽

Human Factor ◽

Gradient Boosting ◽

Automated Classification ◽

It Services ◽

Learning Methods ◽

Machine Learning Methods ◽

Text Information ◽

Comparison Of The Results

Abstract The article analyzes the application of machine learning methods for automated classification and routing in ITIL library. ITSM technology and ITIL are considered. The definitions of the incident and IT services are given. Then, the vectorization and extraction of keywords in the information written in natural language is carried out and lemmatization and TF-IDF measure will be used. A comparative analysis of the application of machine learning methods is given as well as a comparison of the results of automatic classification of text information using gradient boosting and a convolutional neural network. Various parameters of these methods are considered and the most effective method of machine learning is determined. The results of using machine learning methods for automated classification of incidents allows high-precision routing of requests for restoring the operability of IT services, reducing response time and errors associated with the human factor.

Download Full-text