Characterizing glacial processes applying classical beamforming and machine learning

The cryosphere is a highly active and dynamic environment that rapidly responds to changing climatic conditions. processes behind are poorly understood they remain challenging to observe. Glacial dynamics are strongly intermittent in time and heterogeneous in space. Thus, monitoring with high spatio-temporal resolution is essential. In course of the RESOLVE project, continuous seismic observations were obtained using a dense seismic network (100 nodes, &#216; 700 m) installed on the Argenti&#232;re Glacier (French Alpes) during May in 2018. This unique data set offers the chance to study targeted processes and dynamics within the cryosphere on a local scale in detail.We classical beamforming within the of the array (matched field processing) and unsupervised machine learning techniques to identify, cluster and locate seismic sources in 5D (x, y, z, velocity, time). Sources located with high resolution and accuracy related to processes and activity within the ice body, e.g. the geometry and dynamics of crevasses or the interaction at the glacier/bedrock interface, depending on the meteorological conditions such as daily temperature fluctuations or snow fall. Our preliminary results indicate strong potential in poorly resolved sources, which can be observed with statistical consistency reveal new insights into structural features/ physical properties of the glacier (e.g. analysis of scatterers).

Download Full-text

Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-3-26-53-61 ◽

2020 ◽

Vol 26 (3) ◽

pp. 53-61

Author(s):

Pavel Kikin ◽

Alexey Kolesnikov ◽

Alexey Portnov ◽

Denis Grischenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Mathematical Models ◽

Optimal Algorithm ◽

The State ◽

Gradient Boosting ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Spatio Temporal

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem

Download Full-text

Qualitative Spatial and Temporal Reasoning: Current Status and Future Challenges

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/624 ◽

2021 ◽

Author(s):

Michael Sioutis ◽

Diedrich Wolter

Keyword(s):

Machine Learning ◽

Data Mining ◽

Real World ◽

Temporal Reasoning ◽

Current Status ◽

Highly Active ◽

Future Challenges ◽

Symbolic Ai ◽

Spatio Temporal ◽

Time Critical

Qualitative Spatial & Temporal Reasoning (QSTR) is a major field of study in Symbolic AI that deals with the representation and reasoning of spatio- temporal information in an abstract, human-like manner. We survey the current status of QSTR from a viewpoint of reasoning approaches, and identify certain future challenges that we think that, once overcome, will allow the field to meet the demands of and adapt to real-world, dynamic, and time-critical applications of highly active areas such as machine learning and data mining.

Download Full-text

Multi-feature recognition of English text based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189214 ◽

2020 ◽

pp. 1-12

Author(s):

Ao Qi ◽

Liu Narengerile

Keyword(s):

Machine Learning ◽

Character Recognition ◽

Feature Recognition ◽

Structural Features ◽

Statistical Characteristics ◽

English Text ◽

Matching Problem ◽

Data Set ◽

Short Text ◽

Lack Of Information

At present, the recognition method based on character segmentation is not effective in recognizing English text, and the traditional methods are based on the structural features and statistical characteristics of strokes. In order to improve the recognition effect of in English text, from the perspective of machine learning, this study introduces multi-features to improve the lack of information caused by the small Chinese data set. Moreover, this study disassembles the character recognition problem into a text matching problem of question and answer, and the textual entailment problem of answer and standard answer and continues training on the data set of short text score. The final result has a certain improvement, which proves the usability of the mechanism designed in this paper. In order to study the performance of the model proposed in this paper, the model proposed in this paper and the neural network recognition model are compared in terms of recognition accuracy and recognition speed. The research results show that the algorithm proposed in this paper has a certain effect.

Download Full-text

Structure prediction of multi-principal element alloys using ensemble learning

Engineering Computations ◽

10.1108/ec-04-2019-0151 ◽

2019 ◽

Vol 37 (3) ◽

pp. 1003-1022 ◽

Cited By ~ 3

Author(s):

Amitava Choudhury ◽

Tanmay Konnur ◽

P.P. Chattopadhyay ◽

Snehanshu Pal

Keyword(s):

Crystal Structure ◽

Machine Learning ◽

Solid Solution ◽

Structure Prediction ◽

Single Phase ◽

Structural Features ◽

Valence Electron Concentration ◽

Data Set ◽

Content Type

Purpose The purpose of this paper, is to predict the various phases and crystal structure from multi-component alloys. Nowadays, the concept and strategies of the development of multi-principal element alloys (MPEAs) significantly increase the count of the potential candidate of alloy systems, which demand proper screening of large number of alloy systems based on the nature of their phase and structure. Experimentally obtained data linking elemental properties and their resulting phases for MPEAs is profused; hence, there is a strong scope for categorization/classification of MPEAs based on structural features of the resultant phase along with distinctive connections between elemental properties and phases. Design/methodology/approach In this paper, several machine-learning algorithms have been used to recognize the underlying data pattern using data sets to design MPEAs and classify them based on structural features of their resultant phase such as single-phase solid solution, amorphous and intermetallic compounds. Further classification of MPEAs having single-phase solid solution is performed based on crystal structure using an ensemble-based machine-learning algorithm known as random-forest algorithm. Findings The model developed by implementing random-forest algorithm has resulted in an accuracy of 91 per cent for phase prediction and 93 per cent for crystal structure prediction for single-phase solid solution class of MPEAs. Five input parameters are used in the prediction model namely, valence electron concentration, difference in the pauling negativeness, atomic size difference, mixing enthalpy and mixing entropy. It has been found that the valence electron concentration is the most important feature with respect to prediction of phases. To avoid overfitting problem, fivefold cross-validation has been performed. To understand the comparative performance, different algorithms such as K-nearest Neighbor, support vector machine, logistic regression, naïve-based approach, decision tree and neural network have been used in the data set. Originality/value In this paper, the authors described the phase selection and crystal structure prediction mechanism in MPEA data set and have achieved better accuracy using machine learning.

Download Full-text

Augmenting the sensor network around Helgoland using unsupervised machine learning methods

10.5194/egusphere-egu2020-15212 ◽

2020 ◽

Author(s):

Viktoria Wichert ◽

Holger Brix

Keyword(s):

Machine Learning ◽

Sensor Network ◽

Data Set ◽

Unsupervised Machine Learning ◽

Open Sea ◽

Marine System ◽

Spatio Temporal ◽

Temporal Events ◽

And Behavior

A sensor network surrounds the island of Helgoland, supplying marine data centers with autonomous measurements of variables such as temperature, salinity, chlorophyll and oxygen saturation. The output is a data collection containing information about the complicated conditions around Helgoland, lying at the edge between coastal area and open sea. Spatio-temporal phenomena, such as passing river plumes and pollutant influx through flood events can be found in this data set. Through the data provided by the existing measurement network, these events can be detected and investigated.&#160;Because of its important role in understanding the transition between coastal and sea conditions, plans are made to augment the sensor network around Helgoland with another underwater sensor station, an Underwater Node (UWN). The new node is supposed to optimally complement the existing sensor network. Therefore, it makes sense to place it in an area that is not yet represented well by other sensors. The exact spatial and temporal extent of the area of representativity around a sensor is hard to determine, but is assumed to have similar statistical conditions as the sensor measures. This is difficult to specify in the complex system around Helgoland and might change with both, space and time.Using an unsupervised machine learning approach, I determine areas of representativity around Helgoland with the goal of finding an ideal placement for a new sensor node. The areas of representativity are identified by clustering a dataset containing time series of the existing sensor network and complementary model data for a period of several years. The computed areas of representativity are compared to the existing sensor placements to decide where to deploy the additional UWN to achieve a good coverage for further investigations on spatio-temporal phenomena.A challenge that occurs during the clustering analysis is to determine whether the spatial areas of representativity remain stable enough over time to base the decision of long-term sensor placement on its results. I compare results across different periods of time and investigate how fast areas of representativity change spatially with time and if there are areas that remain stable over the course of several years. This also allows insights on the occurrence and behavior of spatio-temporal events around Helgoland in the long-term.&#160;&#160;&#160;&#160;Whether spatial areas of representativity remain stable enough temporally to be taken into account for augmenting sensor networks, influences future network design decisions. This way, the extended sensor network can capture a greater variety of the spatio-temporal phenomena around Helgoland, as well as allow an overview on the long-term behavior of the marine system.

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text