scholarly journals Exploring essential variables in the settlement selection for small-scale maps using machine learning

2019 ◽  
Vol 1 ◽  
pp. 1-2 ◽  
Author(s):  
Izabela Karsznia ◽  
Karolina Sielicka

<p><strong>Abstract.</strong> The decision about removing or maintaining an object while changing detail level requires taking into account many features of the object itself and its surrounding. Automatic generalization is the optimal way to obtain maps at various scales, based on a single spatial database, storing up-to-date information with a high level of spatial accuracy. Researchers agree on the need for fully automating the generalization process (Stoter et al., 2016). Numerous research centres, cartographic agencies as well as commercial companies have undertaken successful attempts of implementing certain generalization solutions (Stoter et al., 2009, 2014, 2016; Regnauld, 2015; Burghardt et al., 2008; Chaundhry and Mackaness, 2008). Nevertheless, an effective and consistent methodology for generalizing small-scale maps has not gained enough attention so far, as most of the conducted research has focused on the acquisition of large-scale maps (Stoter et al., 2016). The presented research aims to fulfil this gap by exploring new variables, which are of the key importance in the automatic settlement selection process at small scales. Addressing this issue is an essential step to propose new algorithms for effective and automatic settlement selection that will contribute to enriching, the sparsely filled small-scale generalization toolbox.</p><p>The main idea behind this research is using machine learning (ML) for the new variable exploration which can be important in the automatic settlement generalization in small-scales. For automation of the generalization process, cartographic knowledge has to be collected and formalized. So far, a few approaches based on the use of ML have already been proposed. One of the first attempts to determine generalization parameters with the use of ML was performed by Weibel et al. (1995). The learning material was the observation of cartographers manual work. Also, Mustière tried to identify the optimal sequence of the generalization operators for the roads using ML (1998). A different approach was presented by Sester (2000). The goal was to extract the cartographic knowledge from spatial data characteristics, especially from the attributes and geometric properties of objects, regularities and repetitive patterns that govern object selection with the use of decision trees. Lagrange et al. (2000), Balboa and López (2008) also used ML techniques, namely neural networks to generalize line objects. Recently, Sester et al. (2018) proposed the application of deep learning for the task of building generalization. As noticed by Sester et al. (2018), these ideas, although interesting, remained proofs of concepts only. Moreover, they concerned topographic databases and large-scale maps. Promising results of automatic settlement selection in small scales was reported by Karsznia and Weibel (2018). To improve the settlement selection process, they have used data enrichment and ML. Thanks to classification models based on the decision trees, they explored new variables that are decisive in the settlement selection process. However, they have also concluded that there is probably still more “deep knowledge” to be discovered, possibly linked to further variables that were not included in their research. Thus the motivation for this research is to fulfil this research gap and look for additional, essential variables governing settlement selection in small scales.</p>

2020 ◽  
Vol 9 (4) ◽  
pp. 230 ◽  
Author(s):  
Izabela Karsznia ◽  
Karolina Sielicka

Effective settlements generalization for small-scale maps is a complex and challenging task. Developing a consistent methodology for generalizing small-scale maps has not gained enough attention, as most of the research conducted so far has concerned large scales. In the study reported here, we want to fill this gap and explore settlement characteristics, named variables that can be decisive in settlement selection for small-scale maps. We propose 33 variables, both thematic and topological, which may be of importance in the selection process. To find essential variables and assess their weights and correlations, we use machine learning (ML) models, especially decision trees (DT) and decision trees supported by genetic algorithms (DT-GA). With the use of ML models, we automatically classify settlements as selected and omitted. As a result, in each tested case, we achieve automatic settlement selection, an improvement in comparison with the selection based on official national mapping agency (NMA) guidelines and closer to the results obtained in manual map generalization conducted by experienced cartographers.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>


2022 ◽  
pp. 251-275
Author(s):  
Edgar Cossio Franco ◽  
Jorge Alberto Delgado Cazarez ◽  
Carlos Alberto Ochoa Ortiz Zezzatti

The objective of this chapter is to implement an intelligent model based on machine learning in the application of macro-ergonomic methods in human resources processes based on the ISO 12207 standard. To achieve the objective, a method of constructing a Java language algorithm is applied to select the best prospect for a given position. Machine learning is done through decision trees and algorithm j48. Among the findings, it is shown that the model is useful in identifying the best profiles for a given position, optimizing the time in the selection process and human resources as well as the reduction of work stress.


2015 ◽  
Vol 767 ◽  
Author(s):  
Subrahmanyam Duvvuri ◽  
Beverley J. McKeon

AbstractA formal relationship between the skewness and the correlation coefficient of large and small scales, termed the amplitude modulation coefficient, is established for a general statistically stationary signal and is analysed in the context of a turbulent velocity signal. Both the quantities are seen to be measures of phase in triadically consistent interactions between scales of turbulence. The naturally existing phase relationships between large and small scales in a turbulent boundary layer are then manipulated by exciting a synthetic large-scale motion in the flow using a spatially impulsive dynamic wall roughness perturbation. The synthetic scale is seen to alter the phase relationships, or the degree of modulation, in a quasi-deterministic manner by exhibiting a phase-organizing influence on the small scales. The results presented provide encouragement for the development of a practical framework for favourable manipulation of energetic small-scale turbulence through large-scale inputs in a wall-bounded turbulent flow.


This paper reviews how Kolmogorov postulated for the first time the existence of a steady statistical state for small-scale turbulence, and its defining parameters of dissipation rate and kinematic viscosity. Thence he made quantitative predictions of the statistics by extending previous methods of dimensional scaling to multiscale random processes. We present theoretical arguments and experimental evidence to indicate when the small-scale motions might tend to a universal form (paradoxically not necessarily in uniform flows when the large scales are gaussian and isotropic), and discuss the implications for the kinematics and dynamics of the fact that there must be singularities in the velocity field associated with the - 5/3 inertial range spectrum. These may be particular forms of eddy or ‘eigenstructure’ such as spiral vortices, which may not be unique to turbulent flows. Also, they tend to lead to the notable spiral contours of scalars in turbulence, whose self-similar structure enables the ‘box-counting’ technique to be used to measure the ‘capacity’ D K of the contours themselves or of their intersections with lines, D' K . Although the capacity, a term invented by Kolmogorov (and studied thoroughly by Kolmogorov & Tikhomirov), is like the exponent 2 p of a spectrum in being a measure of the distribution of length scales ( D' K being related to 2 p in the limit of very high Reynolds numbers), the capacity is also different in that experimentally it can be evaluated at local regions within a flow and at lower values of the Reynolds number. Thus Kolmogorov & Tikhomirov provide the basis for a more widely applicable measure of the self-similar structure of turbulence. Finally, we also review how Kolmogorov’s concept of the universal spatial structure of the small scales, together with appropriate additional physical hypotheses, enables other aspects of turbulence to be understood at these scales; in particular the general forms of the temporal statistics such as the high-frequency (inertial range) spectra in eulerian and lagrangian frames of reference, and the perturbations to the small scales caused by non-isotropic, non-gaussian and inhomogeneous large-scale motions.


2002 ◽  
Vol 450 ◽  
pp. 377-407 ◽  
Author(s):  
S. A. STANLEY ◽  
S. SARKAR ◽  
J. P. MELLADO

Turbulent plane jets are prototypical free shear flows of practical interest in propulsion, combustion and environmental flows. While considerable experimental research has been performed on planar jets, very few computational studies exist. To the authors' knowledge, this is the first computational study of spatially evolving three-dimensional planar turbulent jets utilizing direct numerical simulation. Jet growth rates as well as the mean velocity, mean scalar and Reynolds stress profiles compare well with experimental data. Coherency spectra, vorticity visualization and autospectra are obtained to identify inferred structures. The development of the initial shear layer instability, as well as the evolution into the jet column mode downstream is captured well.The large- and small-scale anisotropies in the jet are discussed in detail. It is shown that, while the large scales in the flow field adjust slowly to variations in the local mean velocity gradients, the small scales adjust rapidly. Near the centreline of the jet, the small scales of turbulence are more isotropic. The mixing process is studied through analysis of the probability density functions of a passive scalar. Immediately after the rollup of vortical structures in the shear layers, the mixing process is dominated by large-scale engulfing of fluid. However, small-scale mixing dominates further downstream in the turbulent core of the self-similar region of the jet and a change from non-marching to marching PDFs is observed. Near the jet edges, the effects of large-scale engulfing of coflow fluid continue to influence the PDFs and non-marching type behaviour is observed.


1994 ◽  
Vol 259 ◽  
pp. 281-290 ◽  
Author(s):  
G. B. Smith ◽  
T. Wei

Off-axis collisions of equal-strength vortex rings were experimentally examined. Two equal-strength vortices were generated which moved toward each other along parallel, but offset, trajectories. Two colour laser-induced fluorescence visualization techniques were used to observe these phenomena and gain insight into their importance in vortex interactions. The most prominent features of this interaction were rapid growth and rotation of the rings and formation of evenly spaced ringlets around the cores of the original rings. Large-scale motions are described using simple vortex induction arguments. The small scales are caused by nonlinear amplification of instabilities during the asymmetric interaction.


1976 ◽  
Vol 77 (2) ◽  
pp. 321-354 ◽  
Author(s):  
A. Pouquet ◽  
U. Frisch ◽  
J. Léorat

To understand the turbulent generation of large-scale magnetic fields and to advance beyond purely kinematic approaches to the dynamo effect like that introduced by Steenbeck, Krause & Radler (1966)’ a new nonlinear theory is developed for three-dimensional, homogeneous, isotropic, incompressible MHD turbulence with helicity, i.e. not statistically invariant under plane reflexions. For this, techniques introduced for ordinary turbulence in recent years by Kraichnan (1971 a)’ Orszag (1970, 1976) and others are generalized to MHD; in particular we make use of the eddy-damped quasi-normal Markovian approximation. The resulting closed equations for the evolution of the kinetic and magnetic energy and helicity spectra are studied both theoretically and numerically in situations with high Reynolds number and unit magnetic Prandtl number.Interactions between widely separated scales are much more important than for non-magnetic turbulence. Large-scale magnetic energy brings to equipartition small-scale kinetic and magnetic excitation (energy or helicity) by the ‘Alfvén effect’; the small-scale ‘residual’ helicity, which is the difference between a purely kinetic and a purely magnetic helical term, induces growth of large-scale magnetic energy and helicity by the ‘helicity effect’. In the absence of helicity an inertial range occurs with a cascade of energy to small scales; to lowest order it is a −3/2 power law with equipartition of kinetic and magnetic energy spectra as in Kraichnan (1965) but there are −2 corrections (and possibly higher ones) leading to a slight excess of magnetic energy. When kinetic energy is continuously injected, an initial seed of magnetic field will grow to approximate equipartition, at least in the small scales. If in addition kinetic helicity is injected, an inverse cascade of magnetic helicity is obtained leading to the appearance of magnetic energy and helicity in ever-increasing scales (in fact, limited by the size of the system). This inverse cascade, predicted by Frischet al.(1975), results from a competition between the helicity and Alféh effects and yields an inertial range with approximately — 1 and — 2 power laws for magnetic energy and helicity. When kinetic helicity is injected at the scale linjand the rate$\tilde{\epsilon}^V$(per unit mass), the time of build-up of magnetic energy with scaleL[Gt ] linjis$t \approx L(|\tilde{\epsilon}^V|l^2_{\rm inj})^{-1/3}.$


2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>


Sign in / Sign up

Export Citation Format

Share Document