A Keyphrase-Based Approach to Text Summarization for English and Bengali Documents

2014 ◽  
Vol 5 (2) ◽  
pp. 28-38 ◽  
Author(s):  
Kamal Sarkar

With the rapid growth of the World Wide Web, information overload is becoming a problem for an increasingly large number of people. Since summarization helps human to digest the main contents of a text document very rapidly, there is a need for an effective and powerful tool that can automatically summarize text. In this paper, we present a keyphrase based approach to single document summarization that extracts first a set of keyphrases from a document, use the extracted keyphrases to choose sentences from the document and finally form an extractive summary with the chosen sentences. We view keyphrases (single or multi-word) as the important concepts and we assume that an extractive summary of a document is an elaboration of the important concepts contained in the document to some permissible extent and it is controlled by the given summary length. We have tested our proposed keyphrase-based summarization approach on two different datasets: one for English and another for Bengali. The experimental results show that the performance of the proposed system is comparable to some state-of-the art summarization systems.


Author(s):  
Olfa Nasraoui

The Web information age has brought a dramatic increase in the sheer amount of information (Web content), in the access to this information (Web usage), and in the intricate complexities governing the relationships within this information (Web structure). Hence, not surprisingly, information overload when searching and browsing the World Wide Web (WWW) has become the plague du jour. One of the most promising and potent remedies against this plague comes in the form of personalization. Personalization aims to customize the interactions on a Web site, depending on the user’s explicit and/or implicit interests and desires.



Author(s):  
Esharenana E. Adomi

The World Wide Web (WWW) has led to the advent of the information age. With increased demand for information from various quarters, the Web has turned out to be a veritable resource. Web surfers in the early days were frustrated by the delay in finding the information they needed. The first major leap for information retrieval came from the deployment of Web search engines such as Lycos, Excite, AltaVista, etc. The rapid growth in the popularity of the Web during the past few years has led to a precipitous pronouncement of death for the online services that preceded the Web in the wired world.



Author(s):  
Omer Casher ◽  
Gudge K. Chandramohan ◽  
Martin J. Hargreaves ◽  
Christopher Leach ◽  
Peter Murray-Rust ◽  
...  


Author(s):  
Mu-Chun Su ◽  
◽  
Shao-Jui Wang ◽  
Chen-Ko Huang ◽  
Pa-ChunWang ◽  
...  

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a <u>si</u>gnal-<u>r</u>epresentation-b<u>a</u>sed <u>p</u>arser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.



2012 ◽  
Vol 263-266 ◽  
pp. 1902-1909
Author(s):  
Oi Mean Foong ◽  
Mellissa Lee

The explosion of information in the World Wide Web is overwhelming for readers with limitless information. Large internet articles or journals are often cumbersome to read as well as comprehend. More often than not, readers are immersed in a pool of information with limited time to assimilate all of the articles. As technology advances, it becomes more convenient to access information on-the-go, i.e., portability of information by utilizing mobile devices. In this research, a semantic and syntatic based summarization is implemented in a text summarizer to solve the information overload problem whilst providing a more coherent summary. The objective is to integrate WordNet into the proposed system aka TextSumIt which condenses lengthy documents into summarized text. The empirical experiments show that it produces satisfactory preliminary results on Android mobile phones.



10.28945/2854 ◽  
2005 ◽  
Author(s):  
Shirlee-ann Knight ◽  
Janice Burn

The rapid growth of the Internet as an environment for information exchange and the lack of enforceable standards regarding the information it contains has lead to numerous information qual ity problems. A major issue is the inability of Search Engine technology to wade through the vast expanse of questionable content and return "quality" results to a user's query. This paper attempts to address some of the issues involved in determining what quality is, as it pertains to information retrieval on the Internet. The IQIP model is presented as an approach to managing the choice and implementation of quality related algorithms of an Internet crawling Search Engine.



2011 ◽  
pp. 167-187 ◽  
Author(s):  
Fabio Grandi ◽  
Federica Mandreoli ◽  
Riccardo Martoglia ◽  
Enrico Ronchetti ◽  
Maria Rita Scalas

While the World Wide Web user is suffering form the disease caused by information overload, for which personalization is one of the treatments which work, the citizen who gets ready to use the e-Government services which are made available on the Web is not immune from contagion. This seems a good reason to try to prescribe a personalization treatment also to the e-Government user. Hence, we introduce the design and implementation of Web information systems supporting personalized access to multi-version resources in an e-Government scenario. Personalization is supported by means of Semantic Web techniques and relies on an ontology-based profiling of users (citizens). Resources we consider are collections of norm documents (laws, decrees, regulations, etc.) in XML format but can also be generic Web pages and portals or e-Government transactional services. We introduce a reference infrastructure, describe the organization and present performance figures of a prototype system we have developed.



Author(s):  
Sudha Ram

We are fortunate to be experiencing an explosive growth and advancement in the Internet and the World Wide Web (WWW). In 1999, the global online population was estimated to be 250 million WWW users worldwide, while the “/images/spacer_white.gif”number of pages on the Web was estimated at 800 million (http://www.internetindicators.com/facts.html). The bright side of this kind of growth is that information is available to almost anyone with access to a computer and a phone line. However, the dark side of this explosion is that we are now squarely in the midst of the “Age of Information Overload”!!! The staggering amount of information has made it extremely difficult for users to locate and retrieve information that is actually relevant to their task at hand. Given the bewildering array of resources being generated and posted on the WWW, the task of finding exactly what a user wants is rather daunting. Although many search engines currently exist to assist in information retrieval, much of the burden of searching is on the end-user. A typical search results in millions of hit, many of which are outdated, irrelevant, or duplicated. One promising approach to managing the information overload problem is to use “intelligent agents” for search and retrieval. This editorial explores the current status of intelligent agents and points out some challenges in the development of intelligent agents based systems.



Sign in / Sign up

Export Citation Format

Share Document