On judgments obtained from a commercial search engine

Author(s):  
Emine Yilmaz ◽  
Gabriella Kazai ◽  
Nick Craswell ◽  
Saied Mehrizi Tahaghoghi
2005 ◽  
Vol 10 (4) ◽  
pp. 517-541 ◽  
Author(s):  
Mike Thelwall

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.


2014 ◽  
Vol 971-973 ◽  
pp. 1870-1873
Author(s):  
Xiao Gang Dong

Web search engine based on DNS, the standard proposed solution of IETF for public web search system, is introduced in this paper. Now no web search engine can cover more than 60 percent of all the pages on Internet. The update interval of most pages database is almost one month. This condition hasn't changed for many years. Converge and recency problems have become the bottleneck problem of current web search engine. To solve these problems, a new system, search engine based on DNS is proposed in this paper. This system adopts the hierarchical distributed architecture like DNS, which is different from any current commercial search engine. In theory, this system can cover all the web pages on Internet. Its update interval could even be one day. The original idea, detailed content and implementation of this system all are introduced in this paper.


2020 ◽  
Vol 34 (05) ◽  
pp. 9146-9153
Author(s):  
Bingning Wang ◽  
Ting Yao ◽  
Qi Zhang ◽  
Jingfang Xu ◽  
Xiaochuan Wang

This paper presents the ReCO, a human-curated Chinese Reading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that it requires various types of reasoning skills such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT (Devlin et al. 2018), only achieves 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO present a good challenge for machine reading comprehension. The codes, dataset and leaderboard will be freely available at https://github.com/benywon/ReCO.


2018 ◽  
Vol 42 (1) ◽  
pp. 87-109
Author(s):  
Maria Jakovljevic ◽  
Alfred Coleman

<div>This study presents the construction of a niche search engine, whose search topic domain is to be user-defined. &nbsp;The specific focus of this study is the investigation of the role that a Support Vector Machine plays when classifying textual data from web pages. Furthermore, the aim is to establish whether this niche search engine can return results that are more relevant to a user than when compared to those returned by a commercial search engine Through the conduction of various experiments across a number of appropriate datasets, the suitability of the SVM to classify web pages has been proven to meet the needs of a niche search engine. A subset of the most useful webpage-specific features has been discovered, with the best performing feature being a web pages’ Text &amp; Title component. The user defined niche search engine was successfully designed and an experiment showed that it returned more relevant results than a commercial search engine.<div> </div></div>


2003 ◽  
Vol 62 (2) ◽  
pp. 121-129 ◽  
Author(s):  
Astrid Schütz ◽  
Franz Machilek

Research on personal home pages is still rare. Many studies to date are exploratory, and the problem of drawing a sample that reflects the variety of existing home pages has not yet been solved. The present paper discusses sampling strategies and suggests a strategy based on the results retrieved by a search engine. This approach is used to draw a sample of 229 personal home pages that portray private identities. Findings on age and sex of the owners and elements characterizing the sites are reported.


Infoman s ◽  
2018 ◽  
Vol 12 (2) ◽  
pp. 115-124
Author(s):  
Yopi Hidayatul Akbar ◽  
Muhammad Agreindra Helmiawan

Social media is one of the information media that is currently widely used by several companies and personally to convey information, with the presence of social media companies no longer need to spread offers through print media, they can use information technology tools in this case social media to submit offers the products they sell to users globally through social media. This social media marketing technique is the process of reaching visits by internet users to certain sites or public attention through social media sites. Marketing activities using social media are usually centered on the efforts of a company to create content that attracts attention, thus encouraging readers to share the content through their social media networks. The application of the QMS method is certainly not only submitted through search engine webmasters, but also on a website keywords must be applied that relate to the contents of the website content, because with the keyword it will automatically attract visitors to the university website based on keyword phrases that they type in the search engine. With Search Media Marketing Technique (SMM) is one of the techniques that must be applied in conducting sales promotions, especially in car dealers in Bandung, it is considered important because each product requires price, feature and convenience socialization through social media so that sales traffic can increase. Each dealer should be able to apply the techniques of Social Media Marketing (SMM) well so that car sales can reach the expected target and provide profits for sales as car sellers in the field.


Sign in / Sign up

Export Citation Format

Share Document