scholarly journals Clustering to Reduce Spatial Data Set Size

Author(s):  
Geoff Boeing

Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We use a machine learning approach with density-based clustering to compress such spatial data into a set of representative features.

2011 ◽  
Vol 18 (1) ◽  
pp. 61-81 ◽  
Author(s):  
FAZEL KESHTKAR ◽  
DIANA INKPEN

AbstractIn this article, we explore the task of mood classification for blog postings. We propose a novel approach that uses the hierarchy of possible moods to achieve better results than a standard machine learning approach. We also show that using sentiment orientation features improves the performance of classification. We used the Livejournal blog corpus as a data set to train and evaluate our method. We present extensive error analysis and discuss the difficulty of the task.


2021 ◽  
Vol 11 (24) ◽  
pp. 11710
Author(s):  
Matteo Miani ◽  
Matteo Dunnhofer ◽  
Fabio Rondinella ◽  
Evangelos Manthos ◽  
Jan Valentin ◽  
...  

This study introduces a machine learning approach based on Artificial Neural Networks (ANNs) for the prediction of Marshall test results, stiffness modulus and air voids data of different bituminous mixtures for road pavements. A novel approach for an objective and semi-automatic identification of the optimal ANN’s structure, defined by the so-called hyperparameters, has been introduced and discussed. Mechanical and volumetric data were obtained by conducting laboratory tests on 320 Marshall specimens, and the results were used to train the neural network. The k-fold Cross Validation method has been used for partitioning the available data set, to obtain an unbiased evaluation of the model predictive error. The ANN’s hyperparameters have been optimized using the Bayesian optimization, that overcame efficiently the more costly trial-and-error procedure and automated the hyperparameters tuning. The proposed ANN model is characterized by a Pearson coefficient value of 0.868.


Author(s):  
Talasila Bhanuteja ◽  
◽  
Kilaru Venkata Narendra Kumar ◽  
Kolli Sai Poornachand ◽  
Chennupati Ashish ◽  
...  

The turn of events and misuse of a few noticeable Data mining strategies in various genuine application regions (for example Trade, Medical management and Natural science) has induced the usage of such methods in Machine Learning (ML) constrains, to distinct helpful snippets of information of the predefined information in medical services networks, biomedical fields and so forth The exact examination of clinical data set advantages in early illness expectation, patient consideration and local area administrations. The methodology of Machine Learning (ML) has been effectively utilized in grouped technologies including Disease forecast. The objective of generating classifier framework utilizing Machine Learning (ML) models is to massively assist with addressing the well-being related issues by helping the doctors to foresee and analyze illnesses at a beginning phase. Sample information of 4920 patient’s records determined to have 41 illnesses was chosen for examination. A reliant variable was made out of 41 sicknesses. 95 of 132 autonomous variables (symptoms) firmly identified with infections were chosen and advanced. This examination work completed shows the illness expectation framework created utilizing Machine learning calculations like Random Forest, Decision Tree Classifier and LightGBM. The paper confers the relative investigation of the consequences of the above-mentioned algorithms are utilized efficiently.


2021 ◽  
Author(s):  
Nobonita Saha ◽  
Aninda Mohanta ◽  
Jannatun Tuba Jyoti ◽  
Tamal Joyti Roy ◽  
Diti Roy

We have collected two data sets. First data set consisted of 45 thousand data and second one 43. One data set consisted of food information , like calorie count, sugar in per 100 gram, fat in per 100 gram and so on. Second data set consisted of Obesity rate among USA people from age 0 to 80. We wanted to show a relation with sugar intake and obesity rate. Last of all our experiment found that ther's a significance evidence that there's a link between obesity and sugar intake . We used the machine learning approach for our experimental analysis.


Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 4035 ◽  
Author(s):  
Abdollah Malekjafarian ◽  
Fatemeh Golpayegani ◽  
Callum Moloney ◽  
Siobhán Clarke

This paper proposes a new two-stage machine learning approach for bridge damage detection using the responses measured on a passing vehicle. In the first stage, an artificial neural network (ANN) is trained using the vehicle responses measured from multiple passes (training data set) over a healthy bridge. The vehicle acceleration or Discrete Fourier Transform (DFT) spectrum of the acceleration is used. The vehicle response is predicted from its speed for multiple passes (monitoring data set) over the bridge. Root-mean-square error is used to calculate the prediction error, which indicates the differences between the predicted and measured responses for each passage. In the second stage of the proposed method, a damage indicator is defined using a Gaussian process that detects the changes in the distribution of the prediction errors. It is suggested that if the bridge condition is healthy, the distribution of the prediction errors will remain low. A recognizable change in the distribution might indicate a damage in the bridge. The performance of the proposed approach was evaluated using numerical case studies of vehicle–bridge interaction. It was demonstrated that the approach could successfully detect the damage in the presence of road roughness profile and measurement noise, even for low damage levels.


2021 ◽  
Author(s):  
Nobonita Saha ◽  
Aninda Mohanta ◽  
Jannatun Tuba Jyoti ◽  
Tamal Joyti Roy ◽  
Diti Roy

We have collected two data sets. First data set consisted of 45 thousand data and second one 43. One data set consisted of food information , like calorie count, sugar in per 100 gram, fat in per 100 gram and so on. Second data set consisted of Obesity rate among USA people from age 0 to 80. We wanted to show a relation with sugar intake and obesity rate. Last of all our experiment found that ther's a significance evidence that there's a link between obesity and sugar intake . We used the machine learning approach for our experimental analysis.


2020 ◽  
Vol 40 (2) ◽  
pp. 231-249
Author(s):  
Mary Priya Sebastian ◽  
G. Santhosh Kumar

This article explores in depth various sandhi (joining) rules in Kerala’s Malayalam language, which play a vital role in framing of the inflected and agglutinated forms of words and their compounds. It discusses significant progress in a scientific method to generate a specific annotated data set of Malayalam words that would be useful in many Natural Language Processing tasks which involve Malayalam preprocessing. The article discusses the results and issues encountered in developing this word-splitting tool for Malayalam, mainly in the context of improving the alignments between parallel texts that form a core resource in the Machine Translation task.


2021 ◽  
Author(s):  
Parker Edwards ◽  
Kristen Skruber ◽  
Nikola Milicevic ◽  
James B. Heidings ◽  
Tracy-Ann Read ◽  
...  

Machine learning has greatly expanded the ability to classify images. However, many machine learning classifiers require thousands of images for training and lack quantitative descriptors of how images were grouped. We overcome these limitations with a machine learning approach based on topological data analysis, where a data set of 20-30 images is sufficient to accurately train the classifier. Our method quantifies differences between groups and identifies subcellular regions with the largest dissimilarities.


2021 ◽  
Author(s):  
Silke van Klaveren ◽  
Ivan Vasconcelos ◽  
Andre Niemeijer

<p>The successful prediction of earthquakes is one of the holy grails in Earth Sciences. Traditional predictions use statistical information on recurrence intervals, but those predictions are not accurate enough. In a recent paper, a machine learning approach was proposed and applied to data of laboratory earthquakes. The machine learning algorithm utilizes continuous measurements of radiated energy through acoustic emissions and the authors were able to successfully predict the timing of laboratory earthquakes. Here, we reproduced their model which was applied to a gouge layer of glass beads and applied it to a data set obtained using a gouge layer of salt. In this salt experiment different load point velocities were set, leading to variable recurrence times. The machine learning technique we use is called random forest and uses the acoustic emissions during the interseismic period. The random forest model succeeds in making a relatively reliable prediction for both materials, also long before the earthquake. Apparently there is information in the data on the timing of the next earthquake throughout the experiment. For glass beads energy is gradually and increasingly released whereas for salt energy is only released during precursor activity, therefore the important features used in the prediction are different. We interpret the difference in results to be due to the different micromechanics of slip. The research shows that a machine learning approach can reveal the presence of information in the data on the timing of unstable slip events (earthquakes). Further research is needed to identify the responsible micromechanical processes which might be then be used to extrapolate to natural conditions.</p>


Sign in / Sign up

Export Citation Format

Share Document