Automatic Classification of Text Complexity

This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.

Download Full-text

Analysis and Automatic Classification of Some Discourse Particles on a Large Set of French Spoken Corpora

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-68456-7_3 ◽

2017 ◽

pp. 32-43 ◽

Cited By ~ 1

Author(s):

Denis Jouvet ◽

Katarina Bartkova ◽

Mathilde Dargnat ◽

Lou Lee

Keyword(s):

Automatic Classification ◽

Large Set ◽

Discourse Particles

Download Full-text

Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach

PeerJ ◽

10.7717/peerj.1279 ◽

2015 ◽

Vol 3 ◽

pp. e1279 ◽

Cited By ~ 10

Author(s):

Marcos Antonio Mouriño García ◽

Roberto Pérez Rodríguez ◽

Luis E. Anido Rifón

Keyword(s):

Classification Problem ◽

Automatic Classification ◽

Important Application ◽

Biomedical Literature ◽

Daily Activities ◽

Bag Of Words ◽

Text Documents ◽

Semantic Relevance ◽

Automatic Document Classification

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.

Download Full-text

Automatic classification of seismo-volcanic signals at La Soufrière of Guadeloupe

10.5194/egusphere-egu2020-10234 ◽

2020 ◽

Author(s):

Alexis Falcin ◽

Jean-Philippe Metaxian ◽

Jérôme Mars ◽

Eléonore Stutzmann ◽

Roberto Moretti ◽

...

Keyword(s):

Physical Process ◽

Average Rate ◽

Spectral Characteristics ◽

Automatic Classification ◽

Full Description ◽

Large Set ◽

Classification Rate ◽

Time Frequency ◽

Accuracy And Precision

Seismic activity at La Soufri&#232;re volcano of Guadeloupe is composed of various transient signals, which are classified manually by the Observatoire Volcanologique et Sismologique de Guadeloupe (OVSG-IPGP) considering waveforms recorded at several stations. Although five main types of signals are recognized in the data analysis by the observatory (Moretti et al., 2020), only three main classes readily distinguishable on seismic traces during the daily analytical protocol have been catalogued: Volcano-Tectonic events, Long-Period events and Nested events, each related to a distinct physical process.Automatic classification of seismo-volcanic signals of La Soufri&#232;re was performed by using an architecture based on supervised learning, available at github.com/malfante/AAA. Seismic waveforms are transformed into a large set of features (34 features for each representation domain) computed from three representation domain of the signal (time, frequency, quefrency). The resulting vectors of features are then used for the modeling. We are using the Random Forest Classifier algorithm from the scikit-learn library.At first, we trained the model with the dataset given by the OVSG consisting of 845 available labeled events (542 VT, 217 nested and 86 LP) recorded in the period 2013-2018. We obtained an average classification rate of 72 %. We determined that the VT class includes a variety of signals covering the LP, Nested and VT classes. Reviewing in details the waveforms and the spectral characteristics of the signals belonging to the 3 classes we then introduced Hybrid events and also defined a monochromatic class (so-called Tornillo) of LP signals, thus matching the full description of signals provided in Moretti et al. (2020).Then, using the new information, a new model was trained with 5 classes and tested. We obtained a much better classification average rate of 84 %. The classification is excellent for Nested events (93 % of accuracy and precision) and Tornillo events (93% of accuracy and precision). The classification of VT events (90% accuracy, 89% precision) and LP events (86% accuracy, 82% precision) were also very good. The most difficult class to recognize is the Hybrid class (64 % accuracy, 69 % precision). Hybrid events are often mixed with VT and LP events. This may be explained by the nature of this class and the physical process that includes both a fracturing and a resonating component with different modal frequencies.Machine learning is a powerful tool to handle large datasets. From a dataset built manually, the processing we applied allowed to obtain a reliable automatic classification by refining class definitions. This has important implications for observatory data processing during unrest and eruptive activity.

Download Full-text

Analysis of IPL Match Results using Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35360 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1746-1751

Author(s):

N. Lokeswari

Keyword(s):

Machine Learning ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Support Vector ◽

Large Set ◽

Regression Problem ◽

The Past ◽

Premier League ◽

Past Data

Indian Premier League (IPL) is a famous Twenty-20 League conducted by The Board of Control for Cricket in India (BCCI). It was started in 2008 and successfully completed its thirteen seasons till 2020. IPL is a popular sport where it has a large set of audience throughout the country. Every cricket fan would be eager to know and predict the IPL match results.A solution using Machine Learning is provided for the analysis of IPL Match results. This paper attempts to predict the match winner and the innings score considering the past data of match by match and ball by ball. Match winner prediction is taken as classification problem and innings score prediction is taken as regression problem. Algorithms like Support Vector Machine(SVM),Naive Bayes, k-Nearest Neighbour(kNN) are used for classification of match winner and Linear Regression, Decision tree for prediction of innings score. The dataset contains many features in which 7 features are identified in which that can be used for the prediction. Based on those features, models are built and evaluated by certain parameters. Based on the results SVM performed.

Download Full-text

Interpreting HVEM of muscle-cell impulse networks

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010012103x ◽

1992 ◽

Vol 50 (1) ◽

pp. 126-127

Author(s):

Paul DeCosta ◽

Kyugon Cho ◽

Stephen Shemlon ◽

Heesung Jun ◽

Stanley M. Dunn

Keyword(s):

Structural Analysis ◽

Muscle Cell ◽

Electron Micrographs ◽

Relative Size ◽

Automatic Classification ◽

Nerve Fibers ◽

Analysis Algorithm ◽

Structural Networks

Introduction: The analysis and interpretation of electron micrographs of cells and tissues, often requires the accurate extraction of structural networks, which either provide immediate 2D or 3D information, or from which the desired information can be inferred. The images of these structures contain lines and/or curves whose orientation, lengths, and intersections characterize the overall network.Some examples exist of studies that have been done in the analysis of networks of natural structures. In, Sebok and Roemer determine the complexity of nerve structures in an EM formed slide. Here the number of nodes that exist in the image describes how dense nerve fibers are in a particular region of the skin. Hildith proposes a network structural analysis algorithm for the automatic classification of chromosome spreads (type, relative size and orientation).

Download Full-text

Automatic classification of sleep stages

Electroencephalography and Clinical Neurophysiology ◽

10.1016/s0013-4694(97)88102-7 ◽

1997 ◽

Vol 103 (1) ◽

pp. 44 ◽

Cited By ~ 2

Author(s):

B Kemp

Keyword(s):

Automatic Classification ◽

Sleep Stages

Download Full-text

RHYNCHOLITES AND THE PROBLEM OF NARROW AND BROAD CONCEPTION OF TAXONS

Proceedings of higher educational establishments Geology and Exploration ◽

10.32454/0016-7762-2018-1-12-17 ◽

2018 ◽

pp. 12-17 ◽

Cited By ~ 3

Author(s):

I. R. Khuzina ◽

V. N. Komarov

Keyword(s):

Sexual Dimorphism ◽

Morphological Characteristics ◽

Mass Scale ◽

Individual Variability ◽

Point Of View ◽

The Other ◽

New Taxon ◽

Artificial System ◽

Broad Understanding

The paper considers a point of view, based on the conception of the broad understanding of taxons. According to this point of view, rhyncholites of the subgenus Dentatobeccus and Microbeccus are accepted to be synonymous with the genus Rhynchoteuthis, and subgenus Romanovichella is considered to be synonymous with the genus Palaeoteuthis. The criteria, exercising influence on the different approaches to the classification of rhyncholites, have been analyzed (such as age and individual variability, sexual dimorphism, pathological and teratological features, degree of disintegration of material), underestimation of which can lead to inaccuracy. Divestment of the subgenuses Dentatobeccus, Microbeccus and Romanovichella, possessing very bright morphological characteristics, to have an independent status and denomination to their synonyms, has been noted to be unjustified. An artificial system (any suggested variant) with all its minuses is a single probable system for rhyncholites. The main criteria, minimizing its negative sides and proving the separation of the new taxon, is an available mass-scale material. The narrow understanding of the genus, used in sensible limits, has been underlined to simplify the problem of the passing the view about the genus to the other investigators and recognition of rhyncholites for the practical tasks.

Download Full-text

LEGAL, MORAL AND ETHICAL SIDE OF THE OFFENSE, DISCREDITING HONOR EMPLOYEE THE PENAL SYSTEM

Sociopolitical sciences ◽

10.33693/2223-0092-2020-10-2-213-218 ◽

2020 ◽

Vol 10 (2) ◽

pp. 213-218

Author(s):

OKSANA KOCHKINA ◽

◽

OLGA MARCHUK ◽

Keyword(s):

Point Of View ◽

Professional Activity ◽

Penal System ◽

Moral Norms ◽

Ethical Aspects ◽

Self Assessment ◽

Personal Qualities ◽

Executive System ◽

Correction System

The article examines the legal and moral and ethical aspects of a misdemeanor that discredits the honor of an employee of the criminal Executive system. The considered reason for dismissal has the main feature associated with the integration of legal and moral norms, which often raises a lot of questions about the attribution of a particular offense to this basis. Using the analysis of normative legal acts, the authors attempt to identify the signs that contribute to the separation of the studied grounds for dismissal from all their diversity. The classification of offenses that discredit the honor of an employee of the criminal Executive system is presented, which allows to systematize and organize the knowledge obtained about the considered grounds for dismissal. The analysis of a misdemeanor that defames the honor of an employee of the penal system from a moral and ethical position gives an understanding, first of all, that it does not have a clear regulation from the point of view of the law, but the consequences of committing such a misdemeanor are clearly legal. The concepts of “honor” and “dignity” are considered as ethical categories and are analyzed as personal qualities that are manifested in an employee of the penal correction system during the period of service. These categories in the behavior of a person or employee are manifested both externally (assessment from the outside) and internally (self-assessment). The article describes the value orientation of an employee of the criminal Executive system to ethical standards in professional activity, which is an integral part of the moral and ethical side of a misdemeanor that discredits the honor of an employee.

Download Full-text

An Enhanced Technique for Classification of Fruits using Shape Color and Texture Features

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0107 ◽

2017 ◽

Vol 7 (7) ◽

pp. 408

Author(s):

Yashpal Jitarwal ◽

Tabrej Ahamad Khan ◽

Pawan Mangal

Keyword(s):

Image Processing ◽

Texture Features ◽

Automatic Classification ◽

Processing Technique ◽

Image Processing Technique ◽

Advance Technique ◽

Time Required ◽

Fruit Sorting ◽

Human Inspection

In earlier times fruits were sorted manually and it was very time consuming and laborious task. Human sorted the fruits of the basis of shape, size and color. Time taken by human to sort the fruits is very large therefore to reduce the time and to increase the accuracy, an automatic classification of fruits comes into existence.To improve this human inspection and reduce time required for fruit sorting an advance technique is developed that accepts information about fruits from their images, and is called as Image Processing Technique.

Download Full-text

Automatic Classification of Brown Spot and Blast Diseases of Rice Using Vegetation Indices Based Segmentation

10.24001/ijaems.icsesd2017.120 ◽

2017 ◽

Author(s):

Anil Bavaskar ◽

Sanjivani G. Barde

Keyword(s):

Vegetation Indices ◽

Automatic Classification ◽

Brown Spot

Download Full-text