Classification of Traffic Events in Mexico City Using Machine Learning and Volunteered Geographic Information

Author(s):  
Magdalena Saldana-Perez ◽  
Miguel Torres-Ruiz ◽  
Marco Moreno-Ibarra

Volunteer geographic information and user-generated content represents a source of updated information about what people perceive from their environment. Its analysis generates the opportunity to develop processes to study and solve social problems that affect the people's lives, merging technology and real data. One of the problems in urban areas is the traffic. Every day at big cities people lose time, money, and life quality when they get stuck in traffic jams; another urban problem derived from traffic is air pollution. In the present approach, a traffic event classification methodology is implemented to analyze VGI and internet information related to traffic events with a view to identify the main traffic problems in a city and to visualize the congested roads. The methodology uses different computing tools and algorithms to achieve the goal. To obtain the data, a social media and RSS channels are consulted. The extracted data texts are classified into seven possible traffic events, and geolocalized. In the classification, a machine learning algorithm is applied.

Author(s):  
A. M. M. Saldana-Perez ◽  
M. Moreno-Ibarra ◽  
M. Tores-Ruiz

The Volunteer Geographic Information (VGI) can be used to understand the urban dynamics. In the <i>classification of traffic related short texts to analyze road problems in urban areas</i>, a VGI data analysis is done over a social media’s publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.


Author(s):  
Ana Maria Magdalena Saldana-Perez ◽  
Marco Antonio Moreno-Ibarra ◽  
Miguel Jesus Torres-Ruiz

It is interesting to exploit the user-generated content (UGC) and to use it with a view to infer new data; volunteered geographic information (VGI) is a concept derived from UGC, whose main importance lies in its continuously updated data. The present approach tries to explode the use of VGI by collecting data from a social network and a RSS service; the short texts collected from the social network are written in Spanish language; text mining and a recovery information processes are applied over the data in order to remove special characters on text and to extract relevant information about the traffic events on the study area; then data are geocoded. The texts are classified by using a machine learning algorithm into five classes, each of them represents a specific traffic event or situation.


Author(s):  
Ana Maria Magdalena Saldana-Perez ◽  
Marco Antonio Moreno-Ibarra ◽  
Miguel Jesus Torres-Ruiz

It is interesting to exploit the user generated content (UGC), and to use it with a view to infer new data; volunteered geographic information (VGI) is a concept derived from UGC, which main importance lies in its continuously updated data. The present approach tries to explode the use of VGI, by collecting data from a social network and a RSS service; the short texts collected from the social network are written in Spanish language; a text mining and a recovery information processes are applied over the data, in order to remove special characters on text, and to extract relevant information about the traffic events on the study area, then data are geocoded. The texts are classified by using a machine learning algorithm into five classes, each of them represents a specific traffic event or situation.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gabriel A. Colozza-Gama ◽  
Fabiano Callegari ◽  
Nikola Bešič ◽  
Ana C. de J. Paviza ◽  
Janete M. Cerutti

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.


2021 ◽  
Vol 11 (3) ◽  
pp. 92
Author(s):  
Mehdi Berriri ◽  
Sofiane Djema ◽  
Gaëtan Rey ◽  
Christel Dartigues-Pallez

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.


1993 ◽  
Vol 18 (2-4) ◽  
pp. 209-220
Author(s):  
Michael Hadjimichael ◽  
Anita Wasilewska

We present here an application of Rough Set formalism to Machine Learning. The resulting Inductive Learning algorithm is described, and its application to a set of real data is examined. The data consists of a survey of voter preferences taken during the 1988 presidential election in the U.S.A. Results include an analysis of the predictive accuracy of the generated rules, and an analysis of the semantic content of the rules.


2021 ◽  
pp. 190-200
Author(s):  
Lesia Mochurad ◽  
Yaroslav Hladun

The paper considers the method for analysis of a psychophysical state of a person on psychomotor indicators – finger tapping test. The app for mobile phone that generalizes the classic tapping test is developed for experiments. Developed tool allows collecting samples and analyzing them like individual experiments and like dataset as a whole. The data based on statistical methods and optimization of hyperparameters is investigated for anomalies, and an algorithm for reducing their number is developed. The machine learning model is used to predict different features of the dataset. These experiments demonstrate the data structure obtained using finger tapping test. As a result, we gained knowledge of how to conduct experiments for better generalization of the model in future. A method for removing anomalies is developed and it can be used in further research to increase an accuracy of the model. Developed model is a multilayer recurrent neural network that works well with the classification of time series. Error of model learning on a synthetic dataset is 1.5% and on a real data from similar distribution is 5%.


Author(s):  
X.-F. Xing ◽  
M. A. Mostafavi ◽  
G. Edwards ◽  
N. Sabo

<p><strong>Abstract.</strong> Automatic semantic segmentation of point clouds observed in a 3D complex urban scene is a challenging issue. Semantic segmentation of urban scenes based on machine learning algorithm requires appropriate features to distinguish objects from mobile terrestrial and airborne LiDAR point clouds in point level. In this paper, we propose a pointwise semantic segmentation method based on our proposed features derived from Difference of Normal and the features “directional height above” that compare height difference between a given point and neighbors in eight directions in addition to the features based on normal estimation. Random forest classifier is chosen to classify points in mobile terrestrial and airborne LiDAR point clouds. The results obtained from our experiments show that the proposed features are effective for semantic segmentation of mobile terrestrial and airborne LiDAR point clouds, especially for vegetation, building and ground classes in an airborne LiDAR point clouds in urban areas.</p>


Author(s):  
M. Esfandiari ◽  
S. Jabari ◽  
H. McGrath ◽  
D. Coleman

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.


Sign in / Sign up

Export Citation Format

Share Document