Social insect inspired approach for identification and dynamic tracking of news stories on the Web

Author(s):  
Stefan Sabo ◽  
Pavol Navrat

2011 ◽  
pp. 3288-3296
Author(s):  
Gian Piero Zarri

A big amount of important, “economically relevant” information, is buried into unstructured “narrative” information resources: This is true, for example, for most of the corporate knowledge documents (memos, policy statements, reports, minutes, etc.), for the news stories, the normative and legal texts, the medical records, many intelligence messages as well as for a huge fraction of the information stored on the Web. In these “narrative documents,” or “narratives,” the main part of the information content consists in the description of “events” that relate the real or intended behavior of some “actors” (characters, personages, etc.)—the term “event” is taken here in its more general meaning, also covering strictly related notions like fact, action, state, and situation. These actors try to attain a specific result, experience particular situations, manipulate some (concrete or abstract) materials, send or receive messages, buy, sell, deliver, and so forth. Note that in these narratives, the actors or personages are not necessarily human beings; we can have narrative documents concerning, for example, the vicissitudes in the journey of a nuclear submarine (the “actor,” “subject,” or “personage”) or the various avatars in the life of a commercial product. Note also that even if a large amount of narrative documents concerns natural language (NL) texts, this is not necessarily true. A photo representing a situation that verbalized could be expressed as “Three nice girls are lying on the beach” is not of course an NL text, yet it is still a narrative document.



Author(s):  
Yijun Gao

This study finds some publicly available data, such as the comments posted to the news stories and online survey results, could be an alternative data source for researchers to analyze news websites when the Web server log data are not available.Cette étude indique que les chercheurs pourraient utiliser des données publiques, comme les commentaires de reportages publiés en ligne et les résultats de sondages électroniques, pour analyser les site Web d’information lorsque les journaux transactionnels des serveurs ne sont pas accessibles. 



2005 ◽  
Vol 11 (1) ◽  
pp. 113-117 ◽  
Author(s):  
ROBERT DALE

One way to keep in touch with what is happening in the commercial speech and language technology world is to pay occasional visits to the websites of HLT Central (at www.hltcentral.org) and LT World (at www.lt-world.org). Both sites provide links to news stories and press releases from companies and other organizations active in the area. The people who run these sites trawl the web for news stories of relevance, saving you the trouble of doing that yourself.



2021 ◽  
Vol 7 (4) ◽  
pp. 221-238
Author(s):  
Richard Bowyer

The regional newspaper industry in the UK is in freefall with sales down more than 60 percent in 10 years. With this decline has come cost-cutting. This study looks at how these cuts have manifested themselves in terms of the number of news stories now being printed in newspapers and the number of local people being quoted in the newspapers. The study has looked at a number of regional newspapers across 30 years to show the effect of the changing face of the newspaper business as the audience and advertising have moved online. The research includes interviews with experts on whether story count mattered and if fewer stories and local voices have damaged the product. This paper finds that generally newspaper companies with a web-first culture have been forced to reduce their local news content in their printed products as they concentrate their resources online. While fewer stories and voices cannot be blamed for the complete demise of the newspapers, it is a consequence of cost-cutting and disadvantages the product. Opinions do vary on the needs for high story count, but this paper shows that most experts believe it is important and that without it, printed newspapers have been damaged. Keywords: newspapers, regional, decline, stories, quotes



Author(s):  
Gian Piero Zarri

A big amount of important, “economically relevant” information, is buried into unstructured “narrative” information resources: This is true, for example, for most of the corporate knowledge documents (memos, policy statements, reports, minutes, etc.), for the news stories, the normative and legal texts, the medical records, many intelligence messages as well as for a huge fraction of the information stored on the Web. In these “narrative documents,” or “narratives,” the main part of the information content consists in the description of “events” that relate the real or intended behavior of some “actors” (characters, personages, etc.)—the term “event” is taken here in its more general meaning, also covering strictly related notions like fact, action, state, and situation. These actors try to attain a specific result, experience particular situations, manipulate some (concrete or abstract) materials, send or receive messages, buy, sell, deliver, and so forth. Note that in these narratives, the actors or personages are not necessarily human beings; we can have narrative documents concerning, for example, the vicissitudes in the journey of a nuclear submarine (the “actor,” “subject,” or “personage”) or the various avatars in the life of a commercial product. Note also that even if a large amount of narrative documents concerns natural language (NL) texts, this is not necessarily true. A photo representing a situation that verbalized could be expressed as “Three nice girls are lying on the beach” is not of course an NL text, yet it is still a narrative document.



Author(s):  
Seokkyung Chung

With the rapid growth of the World Wide Web, Internet users are now experiencing overwhelming quantities of online information. Since manually analyzing the data becomes nearly impossible, the analysis would be performed by automatic data mining techniques to fulfill users’ information needs quickly. On most Web pages, vast amounts of useful knowledge are embedded into text. Given such large sizes of text collection, mining tools, which organize the text datasets into structured knowledge, would enhance efficient document access. This facilitates information search and, at the same time, provides an efficient framework for document repository management as the number of documents becomes extremely huge. Given that the Web has become a vehicle for the distribution of information, many news organizations are providing newswire services through the Internet. Given this popularity of the Web news services, text mining on news datasets has received significant attentions during the past few years. In particular, as several hundred news stories are published everyday at a single Web news site, triggering the whole mining process whenever a document is added to the database is computationally impractical. Therefore, efficient incremental text mining tools need to be developed.



AI Magazine ◽  
2020 ◽  
Vol 41 (4) ◽  
pp. 17-38
Author(s):  
Joshua Eckroth

Since mid-2018, we have used a suite of artificial intelligence (AI) technologies to automatically generate the Association for the Advancement of Artificial Intelligence’s AI-Alert, a weekly email sent to all Association for the Advancement of Artificial Intelligence members and thousands of other subscribers. This alert contains ten news stories from around the web that focus on some aspect of AI, such as new AI inventions, AI’s use in various industries, and AI’s impacts in our daily lives. This alert was curated by-hand for a decade before we developed AI technology for automation, which we call “NewsFinder.” Recently, we redesigned this automation and ran a six-month experiment on user engagement to ensure the new approach was successful. This article documents our design considerations and requirements, our implementation (which involves web crawling, document classification, and a genetic algorithm for story selection), and our reflections after a year and a half since deploying this technology.



Author(s):  
Tabitha M. Powledge

In otherwise hard times, at least one market for science writing appears to be expanding: writing for scientists, particularly online. It's also a market that can offer unusual professional satisfaction. When you write for scientists, you can ignore many of science and medical journalism's topical fads. On the Web, you can pursue subjects that interest you, delve into more of their technical details, and write about them with surprising flexibility and freedom. Like everything else in the dot-corn world, online-only publications for scientists have come and gone. I, for one, am still mourning the disappearance of BioMedNet, which Elsevier dropped at the end of 2003. For several years BMN was an important market. It published at least a couple of news stories every weekday and also covered several basic research conferences annually. But there's good news, too: A few online news operations allied with print publications are still going strong. These outlets, such as TheScientist.com (www.the-scientist.com) and NewScientist.com (www.newscientist.com), publish unique content that does not appear in their print versions. Top weekly journals also publish daily news online—among them Nature (www.nature.com/news) and Science (sciencenow.sciencemag.org). So does the top-tier publication Scientific American (www.sciam.com), which appeals both to those with an armchair interest in science and to scientists themselves. The stories in these online publications—typically short, in the range of 400 to 600 words—are written by both staffers and freelances. One of the best things about writing for scientists on the Web is that it's not like typical Web writing at all. It resembles traditional print writing—but, amazingly, often with fewer constraints. And it is garnished only lightly with electronic doodads. Publications for scientists are not mad for multimedia, so your words don't have to take second (or third) place to video documentaries, interactive quizzes, Flash animation, or chat. Hyperlinks, yes, but only rarely will there be slideshows or snazzy static graphics. Nor is this a deeply collaborative process. Usually it's just you and your editor, who often leaves you to produce your piece in your own way. This is different from Web writing in general, when you might be part of a Web content team whose other members regard you as the least valuable player.



2019 ◽  
Vol 38 (1) ◽  
pp. 71-90 ◽  
Author(s):  
Muzammil Khan ◽  
Arif Ur Rahman

The main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. A number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. The proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. For each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. The potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. Moreover, the model can help to initiate a web preservation process and create a well-organized web archive to efficiently manage the archived web contents. A section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources.



Sign in / Sign up

Export Citation Format

Share Document