A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures

2012 ◽  
Vol 132 (9) ◽  
pp. 1524-1532
Author(s):  
Takahiro Yamasaki ◽  
Kin-ichiroh Tokiwa
2014 ◽  
Vol 165 (2) ◽  
pp. 163-193 ◽  
Author(s):  
Felice Dell’Orletta ◽  
Simonetta Montemagni ◽  
Giulia Venturi

In this paper, we tackle three underresearched issues of the automatic readability assessment literature, namely the evaluation of text readability in less resourced languages, with respect to sentences (as opposed to documents) as well as across textual genres. Different solutions to these issues have been tested by using and refining READ‑IT, the first advanced readability assessment tool for Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ‑IT readability assessment is carried out with respect to both documents and sentences, with the latter constituting an important novelty of the proposed approach: READ‑IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario. By comparing the results of two versions of READ‑IT, adopting a classification‑ versus ranking-based approach, we also show that readability assessment is strongly influenced by textual genre; for this reason a genre-oriented notion of readability is needed. With classification-based approaches, reliable results can only be achieved with genre-specific models: Since this is far from being a workable solution, especially for less resourced languages, a new ranking method for readability assessment is proposed, based on the notion of distance.


2021 ◽  
pp. 129-142
Author(s):  
Anna V. Glazkova ◽  

The article examines the correlation of two indices characterizing the level of linguistic or semantic complexity of the book content. The first index is the age rating in accordance with the Russian Age Rating System for information products. The second index is the ease of understanding of the text, calculated based on the common readability metrics. The author compares the values of readability metrics for texts with different age rating scores. The experiments were carried out on the collection of 5,516 book previews collected by the author of the article. The previews used are freely available in electronic libraries, and they have age rating scores obtained from their publishers. In accordance with the system adopted in the Russian Federation, age rating scores characterize the book’s targeting to the following age categories: 0+, 6+, 12+, 16+, and 18+. In most cases, the size of the book preview is 10% of the full text, which makes it possible to calculate readability indices. The collected texts were scored according to five commonly used readability metrics: Flash-Kincaid Index, Coleman-Liau Index, ARI Index, SMOG Index, and Dale-Chell Formula. As a result of the readability assessment for the texts of each age category, the author obtained recommended levels of education necessary for their understanding. The obtained values were averaged within the age category and analyzed. The results of the experiments allow asserting that in most cases there is a direct relationship between the age rating score of the book and the expected level of education required to understand it. Moreover, readability scores in accordance with all the considered metrics are directly proportional to age rating scores for age categories from 0+ to 16+. The readability scores of books in the 18+ category roughly correspond to children’s literature, which is apparently explained by the genre characteristics of the books marked by the 18+ label. First of all, the results obtained indicate the adequacy of the existing approach to assessing the book age rating in terms of attributing the text to the target audience by age. Secondly, the relationship between readability indices and age rating scores allow using the values of readability metrics as text features in various computational linguistics tasks aimed at text addressee prediction.


2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


Author(s):  
Insih Wilujeng ◽  
Tri Suci Yolanda Putri

This research developed Science, Environment, Technology, Society (SETS) e-module integrated with predict, observe, explain (POE) model on the subject matter of Earth Layer and Its Dynamics for grade VII students. This study aimed to reveal i) the feasibility of the developed e-module for grade VII students, and ii) the practicality of the developed e-module and its dynamics. This is a developmental research adopting the ADDIE model consisting of five stages, i.e.: analysis, design, development, implementation, and evaluation. The subject of the limited test consisted of 15 students of grade VIII.G of Public Junior High School 8 Yogyakarta. The data were collected using a product feasibility assessment sheet for material and media experts, a product practicality assessment sheet for teachers, and a product readability assessment sheet for students. The results show that the developed e-module was feasible to be used according to the material and media experts and the developed e-module is practical according to teachers and students.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1307
Author(s):  
Haoriqin Wang ◽  
Huaji Zhu ◽  
Huarui Wu ◽  
Xiaomin Wang ◽  
Xiao Han ◽  
...  

In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Document Frequency) method to process and analyze the text data and compare it with the Word2vec, GloVe, and TF-IDF methods. Combined with the agricultural word segmentation dictionary, we applied Word2vec with the TF-IDF method, effectively solving the problem of high dimension and sparse data in the rice-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results show that rice-related question similarity matching based on Coattention-DenseGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the rice-related question dataset. The precision and F1 values of the proposed model were 96.3% and 96.9%, respectively. Compared with seven other kinds of question similarity matching models, we present a new state-of-the-art method with our rice-related question dataset.


Sign in / Sign up

Export Citation Format

Share Document