scholarly journals Automatic Classification of Text Complexity

2020 ◽  
Vol 10 (20) ◽  
pp. 7285
Author(s):  
Valentino Santucci ◽  
Filippo Santarelli ◽  
Luciana Forti ◽  
Stefania Spina

This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.

PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1279 ◽  
Author(s):  
Marcos Antonio Mouriño García ◽  
Roberto Pérez Rodríguez ◽  
Luis E. Anido Rifón

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.


2020 ◽  
Author(s):  
Alexis Falcin ◽  
Jean-Philippe Metaxian ◽  
Jérôme Mars ◽  
Eléonore Stutzmann ◽  
Roberto Moretti ◽  
...  

<p>Seismic activity at La Soufrière volcano of Guadeloupe is composed of various transient signals, which are classified manually by the Observatoire Volcanologique et Sismologique de Guadeloupe (OVSG-IPGP) considering waveforms recorded at several stations. Although five main types of signals are recognized in the data analysis by the observatory (Moretti et al., 2020), only three main classes readily distinguishable on seismic traces during the daily analytical protocol have been catalogued: Volcano-Tectonic events, Long-Period events and Nested events, each related to a distinct physical process.</p><p>Automatic classification of seismo-volcanic signals of La Soufrière was performed by using an architecture based on supervised learning, available at github.com/malfante/AAA. Seismic waveforms are transformed into a large set of features (34 features for each representation domain) computed from three representation domain of the signal (time, frequency, quefrency). The resulting vectors of features are then used for the modeling. We are using the Random Forest Classifier algorithm from the scikit-learn library.</p><p>At first, we trained the model with the dataset given by the OVSG consisting of 845 available labeled events (542 VT, 217 nested and 86 LP) recorded in the period 2013-2018. We obtained an average classification rate of 72 %. We determined that the VT class includes a variety of signals covering the LP, Nested and VT classes. Reviewing in details the waveforms and the spectral characteristics of the signals belonging to the 3 classes we then introduced Hybrid events and also defined a monochromatic class (so-called Tornillo) of LP signals, thus matching the full description of signals provided in Moretti et al. (2020).</p><p>Then, using the new information, a new model was trained with 5 classes and tested. We obtained a much better classification average rate of 84 %. The classification is excellent for Nested events (93 % of accuracy and precision) and Tornillo events (93% of accuracy and precision). The classification of VT events (90% accuracy, 89% precision) and LP events (86% accuracy, 82% precision) were also very good. The most difficult class to recognize is the Hybrid class (64 % accuracy, 69 % precision). Hybrid events are often mixed with VT and LP events. This may be explained by the nature of this class and the physical process that includes both a fracturing and a resonating component with different modal frequencies.</p><p>Machine learning is a powerful tool to handle large datasets. From a dataset built manually, the processing we applied allowed to obtain a reliable automatic classification by refining class definitions. This has important implications for observatory data processing during unrest and eruptive activity.</p>


Author(s):  
N. Lokeswari

Indian Premier League (IPL) is a famous Twenty-20 League conducted by The Board of Control for Cricket in India (BCCI). It was started in 2008 and successfully completed its thirteen seasons till 2020. IPL is a popular sport where it has a large set of audience throughout the country. Every cricket fan would be eager to know and predict the IPL match results.A solution using Machine Learning is provided for the analysis of IPL Match results. This paper attempts to predict the match winner and the innings score considering the past data of match by match and ball by ball. Match winner prediction is taken as classification problem and innings score prediction is taken as regression problem. Algorithms like Support Vector Machine(SVM),Naive Bayes, k-Nearest Neighbour(kNN) are used for classification of match winner and Linear Regression, Decision tree for prediction of innings score. The dataset contains many features in which 7 features are identified in which that can be used for the prediction. Based on those features, models are built and evaluated by certain parameters. Based on the results SVM performed.


Author(s):  
Paul DeCosta ◽  
Kyugon Cho ◽  
Stephen Shemlon ◽  
Heesung Jun ◽  
Stanley M. Dunn

Introduction: The analysis and interpretation of electron micrographs of cells and tissues, often requires the accurate extraction of structural networks, which either provide immediate 2D or 3D information, or from which the desired information can be inferred. The images of these structures contain lines and/or curves whose orientation, lengths, and intersections characterize the overall network.Some examples exist of studies that have been done in the analysis of networks of natural structures. In, Sebok and Roemer determine the complexity of nerve structures in an EM formed slide. Here the number of nodes that exist in the image describes how dense nerve fibers are in a particular region of the skin. Hildith proposes a network structural analysis algorithm for the automatic classification of chromosome spreads (type, relative size and orientation).


Author(s):  
I. R. Khuzina ◽  
V. N. Komarov

The paper considers a point of view, based on the conception of the broad understanding of taxons. According to this point of view, rhyncholites of the subgenus Dentatobeccus and Microbeccus are accepted to be synonymous with the genus Rhynchoteuthis, and subgenus Romanovichella is considered to be synonymous with the genus Palaeoteuthis. The criteria, exercising influence on the different approaches to the classification of rhyncholites, have been analyzed (such as age and individual variability, sexual dimorphism, pathological and teratological features, degree of disintegration of material), underestimation of which can lead to inaccuracy. Divestment of the subgenuses Dentatobeccus, Microbeccus and Romanovichella, possessing very bright morphological characteristics, to have an independent status and denomination to their synonyms, has been noted to be unjustified. An artificial system (any suggested variant) with all its minuses is a single probable system for rhyncholites. The main criteria, minimizing its negative sides and proving the separation of the new taxon, is an available mass-scale material. The narrow understanding of the genus, used in sensible limits, has been underlined to simplify the problem of the passing the view about the genus to the other investigators and recognition of rhyncholites for the practical tasks.


2020 ◽  
Vol 10 (2) ◽  
pp. 213-218
Author(s):  
OKSANA KOCHKINA ◽  
◽  
OLGA MARCHUK ◽  

The article examines the legal and moral and ethical aspects of a misdemeanor that discredits the honor of an employee of the criminal Executive system. The considered reason for dismissal has the main feature associated with the integration of legal and moral norms, which often raises a lot of questions about the attribution of a particular offense to this basis. Using the analysis of normative legal acts, the authors attempt to identify the signs that contribute to the separation of the studied grounds for dismissal from all their diversity. The classification of offenses that discredit the honor of an employee of the criminal Executive system is presented, which allows to systematize and organize the knowledge obtained about the considered grounds for dismissal. The analysis of a misdemeanor that defames the honor of an employee of the penal system from a moral and ethical position gives an understanding, first of all, that it does not have a clear regulation from the point of view of the law, but the consequences of committing such a misdemeanor are clearly legal. The concepts of “honor” and “dignity” are considered as ethical categories and are analyzed as personal qualities that are manifested in an employee of the penal correction system during the period of service. These categories in the behavior of a person or employee are manifested both externally (assessment from the outside) and internally (self-assessment). The article describes the value orientation of an employee of the criminal Executive system to ethical standards in professional activity, which is an integral part of the moral and ethical side of a misdemeanor that discredits the honor of an employee.


Author(s):  
Yashpal Jitarwal ◽  
Tabrej Ahamad Khan ◽  
Pawan Mangal

In earlier times fruits were sorted manually and it was very time consuming and laborious task. Human sorted the fruits of the basis of shape, size and color. Time taken by human to sort the fruits is very large therefore to reduce the time and to increase the accuracy, an automatic classification of fruits comes into existence.To improve this human inspection and reduce time required for fruit sorting an advance technique is developed that accepts information about fruits from their images, and is called as Image Processing Technique.


Sign in / Sign up

Export Citation Format

Share Document