WATS-SMS: A T5-based French Wikipedia Abstractive Text Summarizer for SMS

Text summarization remains a challenging task in the Natural Language Processing field despite the plethora of applications in enterprises and daily life. One of the common use cases is the summarization of web pages which has the potential to provide an overview of web pages to devices with limited features. In fact, despite the increasing penetration rate of mobile devices in rural areas, the bulk of those devices offer limited features in addition to the fact that these areas are covered with limited connectivity such as the GSM network. Summarizing web pages into SMS becomes, therefore, an important task to provide information to limited devices. This work introduces WATS-SMS, a T5-based French Wikipedia Abstractive Text Summarizer for SMS. It is built through a transfer learning approach. The T5 English pre-trained model is used to generate a French text summarization model by retraining the model on 25,000 Wikipedia pages then compared with different approaches in the literature. The objective is twofold: (1) to check the assumption made in the literature that abstractive models provide better results compared to extractive ones; and (2) to evaluate the performance of our model compared to other existing abstractive models. A score based on ROUGE metrics gave us a value of 52% for articles with length up to 500 characters against 34.2% for transformer-ED and 12.7% for seq-2seq-attention; and a value of 77% for articles with larger size against 37% for transformers-DMCA. Moreover, an architecture including a software SMS-gateway has been developed to allow owners of mobile devices with limited features to send requests and to receive summaries through the GSM network.

Download Full-text

WATS-SMS: A T5-Based French Wikipedia Abstractive Text Summarizer for SMS

Future Internet ◽

10.3390/fi13090238 ◽

2021 ◽

Vol 13 (9) ◽

pp. 238

Author(s):

Jean Louis Ebongue Kedieng Fendji ◽

Désiré Manuel Taira ◽

Marcellin Atemkeng ◽

Adam Musa Ali

Keyword(s):

Mobile Devices ◽

Language Processing ◽

Rural Areas ◽

Text Summarization ◽

Web Pages ◽

A Value ◽

Gsm Network ◽

The Common ◽

To Receive ◽

Made In

Text summarization remains a challenging task in the natural language processing field despite the plethora of applications in enterprises and daily life. One of the common use cases is the summarization of web pages which has the potential to provide an overview of web pages to devices with limited features. In fact, despite the increasing penetration rate of mobile devices in rural areas, the bulk of those devices offer limited features in addition to the fact that these areas are covered with limited connectivity such as the GSM network. Summarizing web pages into SMS becomes, therefore, an important task to provide information to limited devices. This work introduces WATS-SMS, a T5-based French Wikipedia Abstractive Text Summarizer for SMS. It is built through a transfer learning approach. The T5 English pre-trained model is used to generate a French text summarization model by retraining the model on 25,000 Wikipedia pages then compared with different approaches in the literature. The objective is twofold: (1) to check the assumption made in the literature that abstractive models provide better results compared to extractive ones; and (2) to evaluate the performance of our model compared to other existing abstractive models. A score based on ROUGE metrics gave us a value of 52% for articles with length up to 500 characters against 34.2% for transformer-ED and 12.7% for seq-2seq-attention; and a value of 77% for articles with larger size against 37% for transformers-DMCA. Moreover, an architecture including a software SMS-gateway has been developed to allow owners of mobile devices with limited features to send requests and to receive summaries through the GSM network.

Download Full-text

A Survey of Arabic Named Entity Recognition and Classification

Computational Linguistics ◽

10.1162/coli_a_00178 ◽

2014 ◽

Vol 40 (2) ◽

pp. 469-510 ◽

Cited By ~ 62

Author(s):

Khaled Shaalan

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Relevant Information ◽

Arabic Language ◽

Entity Recognition ◽

Named Entities ◽

Linguistic Resources ◽

Named Entity ◽

To Receive ◽

Made In

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.

Download Full-text

LanguageCrawl: a generic tool for building language models upon common Crawl

Language Resources and Evaluation ◽

10.1007/s10579-021-09551-7 ◽

2021 ◽

Author(s):

Szymon Roziewski ◽

Marek Kozłowski

Keyword(s):

Language Processing ◽

Deep Neural Networks ◽

Language Model ◽

Language Models ◽

Unstructured Data ◽

Web Pages ◽

Data Intensive ◽

The Common ◽

Internet Community ◽

N Gram

AbstractThe exponential growth of the internet community has resulted in the production of a vast amount of unstructured data, including web pages, blogs and social media. Such a volume consisting of hundreds of billions of words is unlikely to be analyzed by humans. In this work we introduce the tool LanguageCrawl, which allows Natural Language Processing (NLP) researchers to easily build web-scale corpora using the Common Crawl Archive—an open repository of web crawl information, which contains petabytes of data. We present three use cases in the course of this work: filtering of Polish websites, the construction of n-gram corpora and the training of a continuous skipgram language model with hierarchical softmax. Each of them has been implemented within the LanguageCrawl toolkit, with the possibility to adjust specified language and n-gram ranks. This paper focuses particularly on high computing efficiency by applying highly concurrent multitasking. Our tool utilizes effective libraries and design. LanguageCrawl has been made publicly available to enrich the current set of NLP resources. We strongly believe that our work will facilitate further NLP research, especially in under-resourced languages, in which the lack of appropriately-sized corpora is a serious hindrance to applying data-intensive methods, such as deep neural networks.

Download Full-text

An Overview of Dementias

Perspectives on Swallowing and Swallowing Disorders (Dysphagia) ◽

10.1044/sasd21.3.75 ◽

2012 ◽

Vol 21 (3) ◽

pp. 75-84

Author(s):

Venkata Vijaya K. Dalai ◽

Jason E. Childress ◽

Paul E Schulz

Keyword(s):

Clinical Presentation ◽

Treatment Options ◽

Current Treatment ◽

Great Variability ◽

Health Concern ◽

Public Health Concern ◽

Life Threatening ◽

The Common ◽

Neurodegenerative Dementias ◽

Made In

Dementia is a major public health concern that afflicts an estimated 24.3 million people worldwide. Great strides are being made in order to better diagnose, prevent, and treat these disorders. Dementia is associated with multiple complications, some of which can be life-threatening, such as dysphagia. There is great variability between dementias in terms of when dysphagia and other swallowing disorders occur. In order to prepare the reader for the other articles in this publication discussing swallowing issues in depth, the authors of this article will provide a brief overview of the prevalence, risk factors, pathogenesis, clinical presentation, diagnosis, current treatment options, and implications for eating for the common forms of neurodegenerative dementias.

Download Full-text

Cuckoo chicks evicting their nest mates: coincidental observations by Edward Jenner in England and Antoine Joseph Lottinger in France

Archives of Natural History ◽

10.3366/anh.2011.0030 ◽

2011 ◽

Vol 38 (2) ◽

pp. 220-228 ◽

Cited By ~ 3

Author(s):

Spencer G. Sealy ◽

Mélanie F. Guigueno

Keyword(s):

Royal Society ◽

Cuculus Canorus ◽

Common Cuckoo ◽

Cuckoo Chick ◽

The Common ◽

Made In ◽

Royal Society Of London

For centuries, naturalists were aware that soon after hatching the common cuckoo (Cuculus canorus) chick became the sole occupant of the fosterer's nest. Most naturalists thought the adult cuckoo returned to the nest and removed or ate the fosterer's eggs and young, or the cuckoo chick crowded its nest mates out of the nest. Edward Jenner published the first description of cuckoo chicks evicting eggs and young over the side of the nest. Jenner's observations, made in England in 1786 and 1787, were published by the Royal Society of London in 1788. Four years before Jenner's observations, in 1782, Antoine Joseph Lottinger recorded eviction behaviour in France and published his observations in Histoire du coucou d'Europe, in 1795. The importance of Lottinger's and Jenner's observations is considered together.

Download Full-text

A Note On Consumption Patterns in the Rural Areas of East Pakistan

The Pakistan Development Review ◽

10.30541/v3i3pp.399-413 ◽

1963 ◽

Vol 3 (3) ◽

pp. 399-413

Author(s):

Mohammad Irshad Khan

Keyword(s):

Rural Areas ◽

The United States ◽

Development Planning ◽

Consumption Patterns ◽

Regional Patterns ◽

Effective Utilization ◽

Income Elasticities ◽

On Demand ◽

Limited Applicability ◽

Made In

The main purpose of this paper is to present estimates of income elasticities for various commodity groups in East Pakistan. To date no such studies have been conducted in that province; and estimates made in other areas of the subcontinent have only limited applicability. Analysis of consumption patterns is essential for development planning because priorities and investment targets have to be based on demand forecasts for different commodities. Forecasting demand requires, among other variables, reliable estimates of income elasticities. In addition, knowledge about elasticities can be useful in deciding taxation policies and other controls over consumption. Further, in countries like Pakistan where large quantities of surplus foods are imported under the United States PL 480 programme, knowledge of income elasticities and regional patterns of consumption is important to permit effective utilization of these imports for economic development.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Natural Language Processing (NLP) based Text Summarization - A Survey

2021 6th International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict50816.2021.9358703 ◽

2021 ◽

Author(s):

Ishitva Awasthi ◽

Kuntal Gupta ◽

Prabjot Singh Bhogal ◽

Sahejpreet Singh Anand ◽

Piyush Kumar Soni

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization

Download Full-text

IONIZATION ENERGY AND IMPURITY BAND CONDUCTION OF SHALLOW DONORS IN n-GALLIUM ARSENIDE

Canadian Journal of Physics ◽

10.1139/p67-013 ◽

1967 ◽

Vol 45 (1) ◽

pp. 119-126 ◽

Cited By ~ 24

Author(s):

J. Basinski ◽

R. Olivier

Keyword(s):

Ionization Energy ◽

Impurity Band ◽

Shallow Donor ◽

Band Model ◽

A Value ◽

Transition Concentration ◽

Temperature Electron ◽

Two Band Model ◽

Band Conduction ◽

Made In

Hall effect and resistivity measurements have been made in the temperature range 4.2–360 °K on several samples of n-type GaAs grown under oxygen atmosphere and without any other intentional dopings. The principal shallow donor in this material is considered to be Si. All samples exhibited impurity-band conduction at low temperature. Electron concentrations in the conduction band were calculated, using a two-band model, and then fitted to the usual equation expressing charge neutrality. A value of 2.3 × 10−3 eV was obtained for the ionization energy of the donors, for donor concentration ranging from 5 × 1015 cm−3 to 2 × 1016 cm−3. The conduction in the impurity band was of the hopping type for these concentrations. A value of 3.5 × 1016 cm−3 was obtained for the critical transition concentration of the impurity-band conduction to the metallic type.

Download Full-text

The Technique of the Boyne Carvings

Proceedings of the Prehistoric Society ◽

10.1017/s0079497x00017515 ◽

1956 ◽

Vol 21 ◽

pp. 156-159

Author(s):

O. G. S. Crawford

Keyword(s):

Presidential Address ◽

Time And Space ◽

East Anglia ◽

Actual Design ◽

The Many ◽

To Receive ◽

Vast Range ◽

Made In

The prudent contributor to a Festschrift will select some subject about which he thinks he knows as much as the professor who is to receive it. That is peculiarly difficult here because of the vast range of Professor Childe's knowledge, both in time and space, far exceeding the present contributor's. This Note is offered as a grateful tribute from one of the many who have been intellectually enriched by his writings and encouraged by his devotion to scholarship. It is little more than an amplification and criticism of the Abbé Breuil's classic Presidential Address to the Prehistoric Society of East Anglia, delivered in 1934; but on the strength of observations made in August and September, 1955, I have come to different conclusions.The Abbé Breuil detected five successive techniques, all of them found on the stones of the Boyne Tombs:(1) Incised thin lines (pl. XIX, B).(2) Picked grooves left rough (pl. XVIII).(3, a) Picked grooves afterwards rubbed smooth; in this and the preceding group ‘it is invariably the line (groove) itself on which the pattern depends, which gives and is the design’.(3, b) Picked areas which ‘only define the limits of the pattern, the surface, left in relief by the cutting down of the background, constituting the actual design’ (pl. xx, B).(4) Rectilinear patterns where also the pattern is residual, consisting of raised ribs, forming triangles or lozenges, left standing by picking away the surrounding surface (pl. xx, A).

Download Full-text