Genre annotation for the Web

10.1075/rs.19015.sha ◽

2021 ◽

Author(s):

Serge Sharoff

Keyword(s):

Classification Model ◽

Linguistic Features ◽

General Reference ◽

Digital Curation ◽

Communicative Functions ◽

Genre Classification ◽

Automatic Text Classification ◽

Deep Learning Model ◽

Automatic Text ◽

The Web

Abstract This paper describes a digital curation study aimed at comparing the composition of large Web corpora, such as enTenTen, ukWac or ruWac, by means of automatic text classification. First, the paper presents a Deep Learning model suitable for classifying texts from large Web corpora using a small number of communicative functions, such as Argumentation or Reporting. Second, it describes the results of applying the automatic classification model to these corpora and compares their composition. Finally, the paper introduces a framework for interpreting the results of automatic genre classification using linguistic features. The framework can help in comparing general reference corpora obtained from the Web and in comparing corpora across languages.

Download Full-text

Research on Automatic Text Classification Algorithm Based on ITF-IDF and KNN

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.1830 ◽

2015 ◽

Vol 713-715 ◽

pp. 1830-1834

Author(s):

Rong Chen ◽

Feng Chen ◽

Yi Sun

Keyword(s):

Text Classification ◽

Information Filtering ◽

Classification Model ◽

Related Information ◽

Text Feature ◽

Extraction Algorithm ◽

Automatic Text Classification ◽

Text Features ◽

Automatic Text ◽

Better Than

We consider how to efficiently text classification on all pairs of documents. This information can be used to information retrieval, digital library, information filtering, and search engine, among others. This paper describes text classification model which based on KNN algorithm. The text feature extraction algorithm, TF-IDF, can loss related information between text features, an improved ITF-IDF algorithm has been presented in order to overcome it. Our experiments show that our algorithm is better than others.

Download Full-text

Outomatiese genreklassifikasie vir Afrikaans

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie ◽

10.4102/satnt.v33i1.759 ◽

2014 ◽

Vol 33 (1) ◽

Author(s):

Dirk Snyman ◽

Gerhard Van Huyssteen ◽

Walter Daelemans

Keyword(s):

Classification System ◽

Text Processing ◽

Data Representation ◽

Performance Measure ◽

Classification Systems ◽

Training Data ◽

Learning Approaches ◽

Genre Classification ◽

Automatic Text Classification ◽

Automatic Text

When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated, using automatic text classification systems which classify a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is its genre. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aimed to investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language). With the development of an automatic genre classification system, there is a set of variables that must be considered as they influence the performance of machine learning approaches (i.e. the algorithm used, the amount of training data, and data representation as features). If these variables are handled correctly, an optimal combination of them can be identified to successfully develop a genre classification system. In this article a genre classification system is being developed by using the following approach: The implementation of a MNB algorithm with a bag of words approach feature set. This system provides a resultant f-score (performance measure) of 0.929.

Download Full-text

Cross-Testing a Genre Classification Model for the Web

Text, Speech and Language Technology - Genres on the Web ◽

10.1007/978-90-481-9178-9_5 ◽

2010 ◽

pp. 87-128 ◽

Cited By ~ 10

Author(s):

Marina Santini

Keyword(s):

Classification Model ◽

Genre Classification ◽

The Web

Download Full-text

A multi-dimensional contrastive study of English abstracts by native and non-native writers

Corpora ◽

10.3366/cor.2013.0041 ◽

2013 ◽

Vol 8 (2) ◽

pp. 209-234 ◽

Cited By ~ 15

Author(s):

Yan Cao ◽

Richard Xiao

Keyword(s):

Factor Analysis ◽

Noun Phrases ◽

Analytical Framework ◽

English Abstract ◽

Academic Disciplines ◽

Semantic Features ◽

Linguistic Features ◽

Contrastive Study ◽

Active Involvement ◽

Communicative Functions

This article takes the multi-dimensional (MD) analysis approach to explore the textual variations between native and non-native English abstracts on the basis of a balanced corpus containing English abstracts written by native English and native Chinese writers from twelve academic disciplines. A total of 47 out of 163 linguistic features are retained after factor analysis, which underlies a seven-dimension framework representing seven communicative functions. The results show that the two types of abstracts demonstrate significant differences in five out of the seven dimensions. To be more specific, native English writers display a more active involvement and commitment in presenting their ideas than Chinese writers. They also use intensifying devices more frequently. In contrast, Chinese writers show stronger preferences for conceptual elaboration, passives and abstract noun phrases no matter whether the two types of data are examined as a whole or whether variations across disciplines are taken into account. The results are discussed in relation to the possible reasons and suggestions for English abstract writing in China. Methodologically, this study innovatively expands on Biber's (1988) MD analytical framework by integrating colligation in addition to grammatical and semantic features.

Download Full-text

A Study on the Real-Time Music Genre Classification Model Using Convolutional Neural Network-based Genre Subclass Models

Journal of Institute of Control Robotics and Systems ◽

10.5302/j.icros.2021.21.0041 ◽

2021 ◽

Vol 27 (7) ◽

pp. 490-496

Author(s):

Won Jun Lee ◽

Hong Seong Park

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real Time ◽

Classification Model ◽

The Real ◽

Genre Classification ◽

Music Genre ◽

Music Genre Classification

Download Full-text

Text Simplification

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.52 ◽

2018 ◽

Cited By ~ 1

Author(s):

Horacio Saggion

Keyword(s):

Language Processing ◽

Language Resources ◽

The Past ◽

Text Simplification ◽

Text Readability ◽

Target User ◽

Evaluation Approaches ◽

Linguistic Impairment ◽

Automatic Text ◽

The Web

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.

Download Full-text

Toward a multi-sensor neural net approach to automatic text classification

Advanced IT Tools ◽

10.1007/978-0-387-34979-4_41 ◽

1996 ◽

pp. 367-373

Author(s):

Venu Dasigi ◽

Reinhold C. Mann

Keyword(s):

Text Classification ◽

Neural Net ◽

Automatic Text Classification ◽

Automatic Text

Download Full-text

Attention-Guided Digital Adversarial Patches on Visual Detection

Security and Communication Networks ◽

10.1155/2021/6637936 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Dapeng Lang ◽

Deyun Chen ◽

Ran Shi ◽

Yongjun He

Keyword(s):

Deep Learning ◽

Learning Model ◽

Detection Algorithm ◽

Classification Model ◽

Practical Application ◽

Single Target ◽

Attachment Mechanism ◽

Human Eyes ◽

Deep Learning Model

Deep learning has been widely used in the field of image classification and image recognition and achieved positive practical results. However, in recent years, a number of studies have found that the accuracy of deep learning model based on classification greatly drops when making only subtle changes to the original examples, thus realizing the attack on the deep learning model. The main methods are as follows: adjust the pixels of attack examples invisible to human eyes and induce deep learning model to make the wrong classification; by adding an adversarial patch on the detection target, guide and deceive the classification model to make it misclassification. Therefore, these methods have strong randomness and are of very limited use in practical application. Different from the previous perturbation to traffic signs, our paper proposes a method that is able to successfully hide and misclassify vehicles in complex contexts. This method takes into account the complex real scenarios and can perturb with the pictures taken by a camera and mobile phone so that the detector based on deep learning model cannot detect the vehicle or misclassification. In order to improve the robustness, the position and size of the adversarial patch are adjusted according to different detection models by introducing the attachment mechanism. Through the test of different detectors, the patch generated in the single target detection algorithm can also attack other detectors and do well in transferability. Based on the experimental part of this paper, the proposed algorithm is able to significantly lower the accuracy of the detector. Affected by the real world, such as distance, light, angles, resolution, etc., the false classification of the target is realized by reducing the confidence level and background of the target, which greatly perturbs the detection results of the target detector. In COCO Dataset 2017, it reveals that the success rate of this algorithm reaches 88.7%.

Download Full-text

Toward a multi-sensor-based approach to automatic text classification

10.2172/130610 ◽

1995 ◽

Author(s):

V.R. Dasigi ◽

R.C. Mann

Keyword(s):

Text Classification ◽

Automatic Text Classification ◽

Automatic Text

Download Full-text

Register and Register Variation

Linguistics ◽

10.1093/obo/9780199772810-0278 ◽

2021 ◽

Keyword(s):

Large Scale ◽

Linguistic Features ◽

Functional Perspective ◽

Communicative Functions ◽

Situational Characteristics ◽

Primary Focus ◽

Key Resources ◽

Linguistic Approach

Register research has been approached from differing theoretical and methodological approaches, resulting in different definitions of the term register. In the text-linguistic approach, which is the primary focus of this bibliography, register refers to text varieties that are defined by their situational characteristics, such as the purpose of writing and the mode of communication, among others. Texts that are similar in their situational characteristics also tend to share similar linguistic profiles, as situational characteristics motivate or require the use of specific linguistic features. Text-linguistic research on register tends to focus on two aspects: attempts to describe a register, or attempts to understand patterns of register variation. This research happens via comparative analyses, specific examinations of single linguistic features or situational parameters, and often via examinations of co-occurrence of linguistic features that are analyzed from a functional perspective. That is, certain lexico-grammatical features co-occur in a given text because they together serve important communicative functions that are motivated by the situational characteristics of the text (e.g., communicative purpose, mode, setting, interactivity). Furthermore, corpus methods are often relied upon in register studies, which allows for large-scale examinations of both general and specialized registers. Thus, the bibliography gives priority to research that uses corpus tools and methods. Finally, while the broadest examinations on register focus on the distinction between written and spoken domains, additional divisions of register studies fall under the categories of written registers, spoken registers, academic registers, historical registers, and electronic/online registers. This bibliography primarily introduces some of the key resources on English registers, a decision that was made to reach a broader audience.

Download Full-text