A Nominal Filter for Web Search Snippets: Using the Web to Identify Members of Latin America's Highly Qualified Diaspora

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.

Download Full-text

A Survey on Improving the Web Search Ranking by User Behavior Information

SSRN Electronic Journal ◽

10.2139/ssrn.3419718 ◽

2010 ◽

Author(s):

Mohamed Husain ◽

Amarjeet Singh ◽

Manoj Kumar ◽

Rakesh Ranjan

Keyword(s):

Web Search ◽

User Behavior ◽

The Web ◽

Web Search Ranking

Download Full-text

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Effects of the metacognitive computer-tool met.a.ware on the web search of laypersons

Computers in Human Behavior ◽

10.1016/j.chb.2007.01.023 ◽

2008 ◽

Vol 24 (3) ◽

pp. 716-737 ◽

Cited By ~ 90

Author(s):

Marc Stadtler ◽

Rainer Bromme

Keyword(s):

Web Search ◽

Computer Tool ◽

The Web

Download Full-text

Web Text Mining

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.27 ◽

2016 ◽

Author(s):

Ricardo Baeza-Yates ◽

Roi Blanco ◽

Malú Castellanos

Keyword(s):

Social Media ◽

Text Mining ◽

Sentiment Analysis ◽

Web Search ◽

Internet Users ◽

Entity Retrieval ◽

Web Text Mining ◽

Text Content ◽

The Web

Web search has become a ubiquitous commodity for Internet users. This fact puts a large number of documents with plenty of text content at our fingertips. To make good use of this data, we need to mine web text. This triggers the two problems covered here: sentiment analysis and entity retrieval in the context of the Web. The first problem answers the question of what people think about a given product or a topic, in particular sentiment analysis in social media. The second problem addresses the issue of solving certain enquiries precisely by returning a particular object: for instance, where the next concert of my favourite band will be or who the best cooks are in a particular region. Where to find these objects and how to retrieve, rank, and display them are tasks related to the entity retrieval problem.

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

Conclusions

Agency and the Semantic Web ◽

10.1093/oso/9780199292486.003.0014 ◽

2006 ◽

Author(s):

Christopher Walton

Keyword(s):

Semantic Web ◽

Large Scale ◽

Web Search ◽

Web Based ◽

E Learning ◽

Computer Based ◽

Intelligent Devices ◽

Automated Data Integration ◽

The Web

At the start of this book we outlined the challenges of automatic computer based processing of information on the Web. These numerous challenges are generally referred to as the ‘vision’ of the Semantic Web. From the outset, we have attempted to take a realistic and pragmatic view of this vision. Our opinion is that the vision may never be fully realized, but that it is a useful goal on which to focus. Each step towards the vision has provided new insights on classical problems in knowledge representation, MASs, and Web-based techniques. Thus, we are presently in a significantly better position as a result of these efforts. It is sometimes difficult to see the purpose of the Semantic Web vision behind all of the different technologies and acronyms. However, the fundamental purpose of the Semantic Web is essentially large scale and automated data integration. The Semantic Web is not just about providing a more intelligent kind of Web search, but also about taking the results of these searches and combining them in interesting and useful ways. As stated in Chapter 1, the possible applications for the Semantic Web include: automated data mining, e-science experiments, e-learning systems, personalized newspapers and journals, and intelligent devices. The current state of progress towards the Semantic Web vision is summarized in Figure 8.1. This figure shows a pyramid with the human-centric Web at the bottom, sometimes termed the Syntactic Web, and the envisioned Semantic Web at the top. Throughout this book, we have been moving upwards on this pyramid, and it should be clear that a great deal of progress that has been made towards the goal. This progress is indicated by the various stages of the pyramid, which can be summarized as follows: • The lowest stage on the pyramid is the basic Web that should be familiar to everyone. This Web of information is human-centric and contains very little automation. Nonetheless, the Web provides the basic protocols and technologies on which the Semantic Web is founded. Furthermore, the information which is represented on the Web will ultimately be the source of knowledge for the Semantic Web.

Download Full-text

Using Belief Functions in Software Agents to Test the Strength of Application Controls

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2016070101 ◽

2016 ◽

Vol 12 (3) ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Robert A. Nehmer ◽

Rajendra P. Srivastava

Keyword(s):

Internal Control ◽

Web Search ◽

Software Agents ◽

Transaction Processing ◽

Internal Controls ◽

Belief Functions ◽

Information Systems Security ◽

Agent Model ◽

Systems Security ◽

The Web

Belief functions have been used to model audit decision making for over 20 years. More recently they have been used in assessing the strength of internal controls and information systems security. There has been some research on software agents in auditing, particularly in the web search bot area Nelson et al. (2000). This research used their results to develop an agent model to provide CPA services which add value to client automated systems. It extends the work of Srivastava and others (Bovee et al., 2007; Srivastava & Shafer, 1992; Srivastava, 1997) in belief functions and Nehmer (2003, 2009) in the use of software agents in internal control evaluations. It looks at the problem of monitoring and assuring the adequacy of application internal controls in highly automated transaction processing environments.

Download Full-text

Personalized web search on e-commerce using ontology based association mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9487 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 286

Author(s):

B. Sekhar Babu ◽

P. Lakshmi Prasanna ◽

P. Vidyullatha

Keyword(s):

Data Mining ◽

Web Search ◽

Large Data ◽

Association Mining ◽

Data Sets ◽

Data Mining Algorithm ◽

Web Data ◽

Data Mining Technique ◽

Web Data Mining ◽

The Web

In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.

Download Full-text

AntWeb—Web Search Based on Ant Behavior

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch010 ◽

2008 ◽

pp. 208-222

Author(s):

Li Weigang ◽

Wu Man Qi

Keyword(s):

Web Mining ◽

Web Search ◽

Theory Model ◽

Web Pages ◽

Web Portal ◽

Knowledge Based ◽

Log Files ◽

Ant Behavior ◽

Shortest Route ◽

The Web

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.

Download Full-text