Machine Learning in Terminology Extraction from Czech and English Texts

Abstract The method of automatic term recognition based on machine learning is focused primarily on the most important quantitative term attributes. It is able to successfully identify terms and non-terms (with success rate of more than 95 %) and find characteristic features of a term as a terminological unit. A single-word term can be characterized as a word with a low frequency that occurs considerably more often in specialized texts than in non-academic texts, occurs in a small number of disciplines, its distribution in the corpus is uneven as is the distance between its two instances. A multi-word term is a collocation consisting of words with low frequency and contains at least one single-word term. The method is based on quantitative features and it makes it possible to utilize the algorithms in multiple disciplines as well as to create cross-lingual applications (verified on Czech and English).

Download Full-text

Acquisition/Processing: Machine learning-based deblending: Dispersed source array data example

The Leading Edge ◽

10.1190/tle40100759.1 ◽

2021 ◽

Vol 40 (10) ◽

pp. 759-767

Author(s):

Rolf H. Baardman ◽

Rob F. Hegge

Keyword(s):

Machine Learning ◽

High Frequency ◽

Low Frequency ◽

Seismic Interpretation ◽

Salt Dome ◽

Training Data ◽

Array Data ◽

Seismic Processing ◽

Processing Machine ◽

Source Array

Machine learning (ML) has proven its value in the seismic industry with successful implementations in areas of seismic interpretation such as fault and salt dome detection and velocity picking. The field of seismic processing research also is shifting toward ML applications in areas such as tomography, demultiple, and interpolation. Here, a supervised ML deblending algorithm is illustrated on a dispersed source array (DSA) data example in which both high- and low-frequency vibrators were deployed simultaneously. Training data pairs of blended and corresponding unblended data were constructed from conventional (unblended) data from another survey. From this training data, the method can automatically learn a deblending operator that is used to deblend for both the low- and the high-frequency vibrators of the DSA data. The results obtained on the DSA data are encouraging and show that the ML deblending method can offer a good performing, less user-intensive alternative to existing deblending methods.

Download Full-text

Arabic English Cross-Lingual Plagiarism Detection Based on Keyphrases Extraction, Monolingual and Machine Learning Approach

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2018/v2i330075 ◽

2019 ◽

pp. 1-12

Author(s):

Mokhtar Al-Suhaiqi ◽

Muneer A. S. Hazaa ◽

Mohammed Albared

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Detection Methods ◽

Support Vector ◽

Svm Classifier ◽

Learning Approach ◽

Plagiarism Detection ◽

Machine Learning Approach ◽

Cross Lingual ◽

Cross Language

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection. This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection. According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used.

Download Full-text

Machine learning-based fracture-hit detection algorithm using LFDAS signal

The Leading Edge ◽

10.1190/tle38070520.1 ◽

2019 ◽

Vol 38 (7) ◽

pp. 520-524 ◽

Cited By ~ 5

Author(s):

Ge Jin ◽

Kevin Mendoza ◽

Baishali Roy ◽

Darryl G. Buswell

Keyword(s):

Machine Learning ◽

Low Frequency ◽

Detection Algorithm ◽

Data Sets ◽

Fracture Zones ◽

Learning Techniques ◽

Distributed Acoustic Sensing ◽

Probability Of Fracture ◽

Simple Neural Network

Low-frequency distributed acoustic sensing (LFDAS) signal has been used to detect fracture hits at offset monitor wells during hydraulic fracturing operations. Typically, fracture hits are manually identified, which can be subjective and inefficient. We implemented machine learning-based models using supervised learning techniques in order to identify fracture zones, which demonstrate a high probability of fracture hits automatically. Several features are designed and calculated from LFDAS data to highlight fracture-hit characterizations. A simple neural network model is trained to fit the manually picked fracture hits. The fracture-hit probability, as predicted by the model, agrees well with the manual picks in training, validation, and test data sets. The algorithm was used in a case study of an unconventional reservoir. The results indicate that smaller cluster spacing design creates denser fractures.

Download Full-text

Distribution Transformer Parameters Detection Based on Low-Frequency Noise, Machine Learning Methods, and Evolutionary Algorithm

Sensors ◽

10.3390/s20154332 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4332

Author(s):

Daniel Jancarczyk ◽

Marcin Bernaś ◽

Tomasz Boczar

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Input Data ◽

Learning Algorithms ◽

Low Frequency ◽

Machine Learning Algorithms ◽

Frequency Noise ◽

Low Frequency Noise ◽

Distribution Transformer ◽

Transformer Model

The paper proposes a method of automatic detection of parameters of a distribution transformer (model, type, and power) from a distance, based on its low-frequency noise spectra. The spectra are registered by sensors and processed by a method based on evolutionary algorithms and machine learning. The method, as input data, uses the frequency spectra of sound pressure levels generated during operation by transformers in the real environment. The model also uses the background characteristic to take under consideration the changing working conditions of the transformers. The method searches for frequency intervals and its resolution using both a classic genetic algorithm and particle swarm optimization. The interval selection was verified using five state-of-the-art machine learning algorithms. The research was conducted on 16 different distribution transformers. As a result, a method was proposed that allows the detection of a specific transformer model, its type, and its power with an accuracy greater than 84%, 99%, and 87%, respectively. The proposed optimization process using the genetic algorithm increased the accuracy by up to 5%, at the same time reducing the input data set significantly (from 80% up to 98%). The machine learning algorithms were selected, which were proven efficient for this task.

Download Full-text

A Machine Learning Approach to Multilingual and Cross-Lingual Ontology Matching

The Semantic Web – ISWC 2011 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25073-6_42 ◽

2011 ◽

pp. 665-680 ◽

Cited By ~ 35

Author(s):

Dennis Spohr ◽

Laura Hollink ◽

Philipp Cimiano

Keyword(s):

Machine Learning ◽

Ontology Matching ◽

Learning Approach ◽

Machine Learning Approach ◽

Cross Lingual

Download Full-text

Machine Learning applied to the Design and Analysis of Low Frequency Electromagnetic Devices

2020 21st International Symposium on Electrical Apparatus & Technologies (SIELA) ◽

10.1109/siela49118.2020.9167158 ◽

2020 ◽

Cited By ~ 1

Author(s):

Arbaaz Khan ◽

David A. Lowther

Keyword(s):

Machine Learning ◽

Low Frequency ◽

Electromagnetic Devices

Download Full-text

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01268-x ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Divneet Mandair ◽

Premanand Tiwari ◽

Steven Simon ◽

Kathryn L. Colborn ◽

Michael A. Rosenberg

Keyword(s):

Neural Network ◽

Machine Learning ◽

Risk Factors ◽

Myocardial Infarction ◽

Logistic Regression ◽

Large Scale ◽

Deep Neural Network ◽

Low Frequency ◽

Common Data Model ◽

Electronic Health Record Data

Abstract Background With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only ‘known’ risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data. Methods Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of ‘known’ risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation. Results Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only ‘known’ risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest. Conclusions Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.

Download Full-text

Assessment of Earthquake Destructive Power to Structures Based on Machine Learning Methods

Applied Sciences ◽

10.3390/app10186210 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6210

Author(s):

Ruihao Zheng ◽

Chen Xiong ◽

Xiangbin Deng ◽

Qiangsheng Li ◽

Yi Li

Keyword(s):

Neural Network ◽

Machine Learning ◽

Ground Motions ◽

Low Frequency ◽

Time History ◽

Time History Analysis ◽

Machine Learning Algorithms ◽

Learning Networks ◽

History Analysis ◽

Deep Layers

This study presents a machine learning-based method for the destructive power assessment of earthquake to structures. First, the analysis procedure of the method is presented, and the backpropagation neural network (BPNN) and convolutional neural network (CNN) are used as the machine learning algorithms. Second, the optimized BPNN architecture is obtained by discussing the influence of a different number of hidden layers and nodes. Third, the CNN architecture is proposed based on several classical deep learning networks. To build the machine learning models, 50,570 time-history analysis results of a structural system subjected to different ground motions are used as training, validation, and test samples. The results of the BPNN indicate that the features extraction method based on the short-time Fourier transform (STFT) can well reflect the frequency-/time-domain characteristics of ground motions. The results of the CNN indicate that the CNN exhibits better accuracy (R2 = 0.8737) compared with that of the BPNN (R2 = 0.6784). Furthermore, the CNN model exhibits remarkable computational efficiency, the prediction of 1000 structures based on the CNN model takes 0.762 s, while 507.81 s are required for the conventional time-history analysis (THA)-based simulation. Feature visualization of different layers of the CNN reveals that the shallow to deep layers of the CNN can extract the high to low-frequency features of ground motions. The proposed method can assist in the fast prediction of engineering demand parameters of large-number structures, which facilitates the damage or loss assessments of regional structures for timely emergency response and disaster relief after earthquake.

Download Full-text

Ambient seismic noise suppression in COST action G2Net

10.5194/egusphere-egu2020-22165 ◽

2020 ◽

Author(s):

Velimir Ilić ◽

Alessandro Bertolini ◽

Fabio Bonsignorio ◽

Dario Jozinović ◽

Tomasz Bulik ◽

...

Keyword(s):

Machine Learning ◽

Gravitational Waves ◽

Seismic Noise ◽

Noise Suppression ◽

Real Life ◽

Low Frequency ◽

Ambient Seismic Noise ◽

Cost Action ◽

The Cost ◽

Life Challenge

<p>The analysis of low-frequency gravitational waves (GW) data is a crucial mission of GW science and the performance of Earth-based GW detectors is largely influenced by ability of combating the low-frequency ambient seismic noise and other seismic influences. This tasks require multidisciplinary research in the fields of seismic sensing, signal processing, robotics, machine learning and mathematical modeling.<br><br>In practice, this kind of research is conducted by large teams of researchers with different expertise, so that project management emerges as an important real life challenge in the projects for acquisition, processing and interpretation of seismic data from GW detector site. A prominent example that successfully deals with this aspect could be observed in the COST Action G2Net (CA17137 - A network for Gravitational Waves, Geophysics and Machine Learning) and its seismic research group, which counts more than 30 members.&#160;</p><div>In this talk we will review the structure of the group, present the goals and recent activities of the group, and present new methods for combating the seismic influences at GW detector site that will be developed and applied within this collaboration.</div><div> <p>&#160;</p> <p>This publication is based upon work from CA17137 - A network for Gravitational Waves, Geophysics and Machine Learning, supported by COST (European Cooperation in Science and Technology).</p> </div>

Download Full-text

Quantifying and Classifying Volcano Video and Infrasound Datasets via Computer Vision and Machine Learning Algorithms

10.18122/td/1662/boisestate ◽

2020 ◽

Author(s):

Alex J. C. Witsil

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Learning Algorithm ◽

Learning Algorithms ◽

Video Data ◽

Machine Learning Algorithms ◽

Activity Data ◽

Time Histories ◽

Characteristic Features ◽

Vision Algorithm

Volcanoes are dangerous and complex with processes coupled to both the subsurface and atmosphere. Effective monitoring of volcanic behavior during and in between periods of crisis requires a diverse suite of instruments and processing routines. Acoustic microphones and video cameras are typical in long-term deployments and provide important constraints on surficial and observational activity yet are underutilized relative to their seismic counterpart. This dissertation increases the utility of infrasound and video datasets through novel applications of computer vision and machine learning algorithms, which help constrain source dynamics and track shifts in activity. Data analyzed come from infrasound and camera installations at Stromboli Volcano, Italy and Villarrica Volcano, Chile and are diverse in terms of the recorded activity. At Villarrica, a computer vision algorithm quantifies video data into a set of characteristic features that are used in a multiparametric analysis with seismic and infrasound data to constrain activity during a period of crisis in 2015. Video features are also input into a machine learning algorithm that classifies data into five modes of activity, which helps track behavior over weekly and monthly time scales. At Stromboli, infrasound signals radiating from the multiple active vents are synthesized into characteristic features and then clustered via an unsupervised learning algorithm. Time histories of cluster activity at each vent reveal concurrent shifts in behavior that suggest a linked plumbing system between the vents. The algorithms presented are general and modular and can be implemented at monitoring agencies that already collect acoustic and video data.

Download Full-text