scholarly journals Definition Extraction from Generic and Mathematical Domains with Deep Ensemble Learning

Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2502
Author(s):  
Natalia Vanetik ◽  
Marina Litvak

Definitions are extremely important for efficient learning of new materials. In particular, mathematical definitions are necessary for understanding mathematics-related areas. Automated extraction of definitions could be very useful for automated indexing educational materials, building taxonomies of relevant concepts, and more. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. In this paper, we focus on automatic detection of one-sentence definitions in mathematical and general texts. We experiment with different classification models arranged in an ensemble and applied to a sentence representation containing syntactic and semantic information, to classify sentences. Our ensemble model is applied to the data adjusted with oversampling. Our experiments demonstrate the superiority of our approach over state-of-the-art methods in both general and mathematical domains.

2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


2020 ◽  
Vol 36 (12) ◽  
pp. 3693-3702 ◽  
Author(s):  
Dandan Zheng ◽  
Guansong Pang ◽  
Bo Liu ◽  
Lihong Chen ◽  
Jian Yang

Abstract Motivation Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available. Results We first construct a large VF dataset, covering 3446 VF classes with 160 495 sequences, and then propose deep convolutional neural network models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1–13% in accuracy and by 1–16% in F1-score. Availability and implementation All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
João Batista ◽  
Ana Cabral ◽  
Maria Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic Programming (GP) is a powerful Machine Learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in Remote Sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs Feature Construction by evolving hyper-features from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyper-feature from satellite bands to improve the classification of land cover types. We add the evolved hyper-features to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (Decision Trees, Random Forests and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyper-features to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI and NBR. We also compare the performance of the M3GP hyper-features in the binary classification problems with those created by other Feature Construction methods like FFX and EFS.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Enbiao Jing ◽  
Haiyang Zhang ◽  
ZhiGang Li ◽  
Yazhi Liu ◽  
Zhanlin Ji ◽  
...  

Based on a convolutional neural network (CNN) approach, this article proposes an improved ResNet-18 model for heartbeat classification of electrocardiogram (ECG) signals through appropriate model training and parameter adjustment. Due to the unique residual structure of the model, the utilized CNN layered structure can be deepened in order to achieve better classification performance. The results of applying the proposed model to the MIT-BIH arrhythmia database demonstrate that the model achieves higher accuracy (96.50%) compared to other state-of-the-art classification models, while specifically for the ventricular ectopic heartbeat class, its sensitivity is 93.83% and the precision is 97.44%.


Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic Programming (GP) is a powerful Machine Learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in Remote Sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs Feature Construction by evolving hyper-features from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyper-feature from satellite bands to improve the classification of land cover types. We add the evolved hyper-features to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (Decision Trees, Random Forests and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyper-features to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI and NBR. We also compare the performance of the M3GP hyper-features in the binary classification problems with those created by other Feature Construction methods like FFX and EFS.


Author(s):  
S. A. Chitnis ◽  
Z. Huang ◽  
K. Khoshelham

Abstract. Mobile lidar point clouds are commonly used for 3d mapping of road environments as they provide a rich, highly detailed geometric representation of objects on and around the road. However, raw lidar point clouds lack semantic information about the type of objects, which is necessary for various applications. Existing methods for the classification of objects in mobile lidar data, including state of the art deep learning methods, achieve relatively low accuracies, and a primary reason for this under-performance is the inadequacy of available 3d training samples to sufficiently train deep networks. In this paper, we propose a generative model for creating synthetic 3d point segments that can aid in improving the classification performance of mobile lidar point clouds. We train a 3d Adversarial Autoencoder (3dAAE) to generate synthetic point segments that exhibit a high resemblance to and share similar geometric features with real point segments. We evaluate the performance of a PointNet-like classifier trained with and without the synthetic point segments. The evaluation results support our hypothesis that training a classifier with training data augmented with synthetic samples leads to significant improvement in the classification performance. Specifically, our model achieves an F1 score of 0.94 for vehicles and pedestrians and 1.00 for traffic signs.


2021 ◽  
Author(s):  
Kira Wegner-Clemens ◽  
George Law Malcolm ◽  
Sarah Shomstein

Semantic information about objects, events, and scenes influences how humans perceive, interact with, and navigate the world. Most evidence in support of semantic influence on cognition has been garnered from research conducted with an isolated modality (e.g., vision, audition). However, the influence of semantic information has not yet been extensively studied in multisensory environments potentially because of the difficulty in quantification of semantic relatedness. Past studies have primary relied on either a simplified binary classification of semantic relatedness based on category or on algorithmic values based on text corpora rather than human perceptual experience and judgement. With the aim to accelerate research into multisensory semantics, we created a constrained audiovisual stimulus set and derived similarity ratings between items within three categories (animals, instruments, household items). A set of 140 participants provided similarity judgments between sounds and images. Participants either heard a sound (e.g., a meow) and judged which of two pictures of objects (e.g., a picture of a dog and a duck) it was more similar to, or saw a picture (e.g., a picture of a duck) and selected which of two sounds it was more similar to (e.g., a bark or a meow). Judgements were then used to calculate similarity values of any given cross-modal pair. The derived and reported similarity judgements reflect a range of semantic similarities across three categories and items, and highlight similarities and differences among similarity judgments between modalities. We make the derived similarity values available in a database format to the research community to be used as a measure of semantic relatedness in cognitive psychology experiments, enabling more robust studies of semantics in audiovisual environments.


2020 ◽  
Vol 6 (4) ◽  
pp. 477-487
Author(s):  
Ding-Nan Zou ◽  
Song-Hai Zhang ◽  
Tai-Jiang Mu ◽  
Min Zhang

AbstractIn this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for fine-grained classification of dogs, including 130 dog breeds and 70,428 real-world images. It has only one dog in each image and provides annotated bounding boxes for the whole body and head. In comparison to previous similar datasets, it contains more breeds and more carefully chosen images for each breed. The diversity within each breed is greater, with between 200 and 7000+ images for each breed. Annotation of the whole body and head makes the dataset not only suitable for the improvement of finegrained image classification models based on overall features, but also for those locating local informative parts. We show that dataset provides a tough challenge by benchmarking several state-of-the-art deep neural models. The dataset is available for academic purposes at https://cg.cs.tsinghua.edu.cn/ThuDogs/.


2020 ◽  
Vol 6 (3) ◽  
pp. 322-325
Author(s):  
Seyed Amir Hossein Tabatabaei ◽  
Gabriela Augustinov ◽  
Volker Gross ◽  
Keywan Sohrabi ◽  
Patrick Fischer ◽  
...  

AbstractIn this paper, a deep learning approach for classification of cough sound segments is presented. The architecture of the network is based on a pre-trained network and the spectrogram images of three recording channels have been extracted for the sake of training the network. The classification accuracy based on three recording channels is 92% for a binary classification model and the network converges fast. Two classification models based on binary and multi-class problems are proposed. Relevant classification parameters including the Receiver Operating Characteristic (ROC) curve are reported.


Sign in / Sign up

Export Citation Format

Share Document