Testing the stability of “wisdom of crowds” judgments of search results over time and their similarity with the search engine rankings

Purpose – One of the under-explored aspects in the process of user information seeking behaviour is influence of time on relevance evaluation. It has been shown in previous studies that individual users might change their assessment of search results over time. It is also known that aggregated judgements of multiple individual users can lead to correct and reliable decisions; this phenomenon is known as the “wisdom of crowds”. The purpose of this paper is to examine whether aggregated judgements will be more stable and thus more reliable over time than individual user judgements. Design/methodology/approach – In this study two simple measures are proposed to calculate the aggregated judgements of search results and compare their reliability and stability to individual user judgements. In addition, the aggregated “wisdom of crowds” judgements were used as a means to compare the differences between human assessments of search results and search engine’s rankings. A large-scale user study was conducted with 87 participants who evaluated two different queries and four diverse result sets twice, with an interval of two months. Two types of judgements were considered in this study: relevance on a four-point scale, and ranking on a ten-point scale without ties. Findings – It was found that aggregated judgements are much more stable than individual user judgements, yet they are quite different from search engine rankings. Practical implications – The proposed “wisdom of crowds”-based approach provides a reliable reference point for the evaluation of search engines. This is also important for exploring the need of personalisation and adapting search engine’s ranking over time to changes in users preferences. Originality/value – This is a first study that applies the notion of “wisdom of crowds” to examine an under-explored in the literature phenomenon of “change in time” in user evaluation of relevance.

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Website removal from search engines due to copyright violation

Aslib Journal of Information Management ◽

10.1108/ajim-05-2018-0108 ◽

2019 ◽

Vol 71 (1) ◽

pp. 54-71 ◽

Cited By ~ 7

Author(s):

Artur Strzelecki

Keyword(s):

Search Engine ◽

Search Engines ◽

Design Methodology ◽

Global Analysis ◽

Domain Name ◽

Content Type ◽

Search Results ◽

Internet Users ◽

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.

Download Full-text

The framing of scientific domains: about UNISIST, domain analysis and art history

Journal of Documentation ◽

10.1108/jd-03-2013-0038 ◽

2014 ◽

Vol 70 (2) ◽

pp. 261-281 ◽

Cited By ~ 3

Author(s):

Hans Dam Christensen

Keyword(s):

Information Seeking ◽

Art History ◽

Scientific Information ◽

Domain Analysis ◽

Content Type ◽

Scientific Domain ◽

Information Tools ◽

Communication Processes ◽

Three Stages ◽

Over Time

Purpose – By using the UNISIST models this paper argues for the necessity of domain analysis in order to qualify scientific information seeking. The models allow better understanding of communication processes in a scientific domain and they embrace the point that domains are always both unstable over time, and changeable, according to the specific perspective. This understanding is even more important today as numerous digitally generated information tools as well as collaborative and interdisciplinary research are blurring the domain borders. Nevertheless, researchers navigate “intuitively” in “their” specific domains, and UNISIST helps understanding this navigation. The paper aims to discuss these issues. Design/methodology/approach – The UNISIST models are tentatively applied to the domain of art history at three stages, respectively two modern, partially overlapping domains, as well as an outline of an art historical domain anno c1820. The juxtapositions are discussed against the backdrop of, among others, poststructuralist concepts such as “power” and “anti-essentialism” Findings – The juxtapositions affirm the point already surfacing in the different versions of the UNISIST model, that is, structures of communication change over time as well as according to the agents that are charting them. As such, power in a Foucauldian sense is unavoidable in outlining a domain. Originality/value – The UNISIST models are applied to the domain of art history and the article discusses the instability of a scientific domain as well as, at the same time, the significance of framing a domain; an implication which is often neglected in scientific information seeking.

Download Full-text

Chromosomal Rearrangements in Salmonella enterica Serovar Typhi Strains Isolated from Asymptomatic Human Carriers

mBio ◽

10.1128/mbio.00060-11 ◽

2011 ◽

Vol 2 (3) ◽

Cited By ~ 13

Author(s):

T. David Matthews ◽

Wolfgang Rabsch ◽

Stanley Maloy

Keyword(s):

Salmonella Enterica ◽

Large Scale ◽

Chromosomal Rearrangements ◽

Growth Conditions ◽

Genetic Changes ◽

Content Type ◽

Long Term Storage ◽

Bacterial Chromosomes ◽

Over Time

ABSTRACTHost-specific serovars ofSalmonella entericaoften have large-scale chromosomal rearrangements that occur by recombination betweenrrnoperons. Two hypotheses have been proposed to explain these rearrangements: (i) replichore imbalance from horizontal gene transfer drives the rearrangements to restore balance, or (ii) the rearrangements are a consequence of the host-specific lifestyle. Although recent evidence has refuted the replichore balance hypothesis, there has been no direct evidence for the lifestyle hypothesis. To test this hypothesis, we determined therrnarrangement type for 20Salmonella entericaserovar Typhi strains obtained from human carriers at periodic intervals over multiple years. These strains were also phage typed and analyzed for rearrangements that occurred over long-term storage versus routine culturing. Strains isolated from the same carrier at different time points often exhibited different arrangement types. Furthermore, colonies isolated directly from the Dorset egg slants used to store the strains also had different arrangement types. In contrast, colonies that were repeatedly cultured always had the same arrangement type. Estimated replichore balance of isolated strains did not improve over time, and some of the rearrangements resulted in decreased replicore balance. Our results support the hypothesis that the restricted lifestyle of host-specificSalmonellais responsible for the frequent chromosomal rearrangements in these serovars.IMPORTANCEAlthough it was previously thought that bacterial chromosomes were stable, comparative genomics has demonstrated that bacterial chromosomes are dynamic, undergoing rearrangements that change the order and expression of genes. While mostSalmonellastrains have a conserved chromosomal arrangement type, rearrangements are very common in host-specificSalmonellastrains. This study suggests that chromosome rearrangements in the host-specificSalmonella entericaserovar Typhi, the causal agent of typhoid fever, occur within the human host over time. The results also indicate that rearrangements can occur during long-term maintenance on laboratory medium. Although these genetic changes do not limit survival under slow-growth conditions, they may limit the survival ofSalmonellaTyphi in other environments, as predicted for the role of pseudogenes and genome reduction in niche-restricted bacteria.

Download Full-text

Panel data or pseudo panels for longitudinal research? Cross-national comparisons using the example of firms' training spend

Evidence-based HRM a Global Forum for Empirical Scholarship ◽

10.1108/ebhrm-08-2020-0106 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Michael Brookes ◽

Chris Brewster ◽

Cigdem Gedikli ◽

Okan Yilmaz

Keyword(s):

Panel Data ◽

Large Scale ◽

Longitudinal Research ◽

Temporal Changes ◽

Annual Data ◽

Firm Level ◽

Content Type ◽

Area Of Interest ◽

Practical Implications ◽

Over Time

PurposeThe evolution of firm level practices over time has always been a keen area of interest for management scholars. However, in comparison to other social scientists, particularly economists, the relative dearth of firm level panel data sets has restricted the methodological options for exploring inter-temporal changes.Design/methodology/approachThis paper applies a pseudo panel methodology to investigate the evolution of training spend at the firm level over time.FindingsThe analysis is framed within a varieties of capitalism lens and by adopting a more meaningful approach to examining changes over time it leads us to question some of the “truisms” linked to firms expected behaviours within different national institutional frameworks.Research limitations/implicationsAs with any large-scale quantitative analysis, it would always benefits from a larger number of observations and/or a longer time period, in this instance access to annual data rather than 4 or 5 year intervals would have been helpful.Practical implicationsBy adopting a different, and more appropriate, approach to analysing existing cross-sectional data over time this empirical research helps to achieve a deeper understanding of the complex issues that influence decision making at the firm level.Social implicationsAt the firm level, in line with the practical implications above, this will enable decision makers to achieve a deeper understanding of the evolution of the external context in which they operate and the likely influence of that evolution within their own organisation.Originality/valueThis approach enables a more meaningful exploration of inter-temporal changes in situations where longitudinal data does not exist.

Download Full-text

An investigation of biases in web search engine query suggestions

Online Information Review ◽

10.1108/oir-11-2018-0341 ◽

2019 ◽

Vol 44 (2) ◽

pp. 365-381 ◽

Cited By ~ 1

Author(s):

Malte Bonart ◽

Anastasiia Samokhina ◽

Gernot Heisenberg ◽

Philipp Schaer

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Query Suggestion ◽

Data Set ◽

Content Type ◽

Web Search Engine ◽

The Stability ◽

Query Suggestions ◽

Over Time

Purpose Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. The purpose of this paper is to analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. The authors test the approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. Design/methodology/approach This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test the framework, the authors collected data from the Google, Bing and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party or age and with regards to the stability of the suggestions over time. Findings By using the framework, the authors located three semantic clusters within the data set: suggestions related to politics and economics, location information and personal and other miscellaneous topics. Among other effects, the results of the analysis show a small bias in the form that male politicians receive slightly fewer suggestions on “personal and misc” topics. The stability analysis of the suggested terms over time shows that some suggestions are prevalent most of the time, while other suggestions fluctuate more often. Originality/value This study proposes a novel framework to automatically identify biases in web search engine query suggestions for person-related searches. Applying this framework on a set of person-related query suggestions shows first insights into the influence search engines can have on the query process of users that seek out information on politicians.

Download Full-text

A Query Expansion Method Based on User's Historical Interested Web Pages and Historical Query Terms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.52-54.1218 ◽

2011 ◽

Vol 52-54 ◽

pp. 1218-1225

Author(s):

Zheng Yu Zhu ◽

Chun Lei Yu ◽

Shu Jia Dong ◽

Jie He

Keyword(s):

Search Engine ◽

Query Expansion ◽

Expansion Method ◽

Experimental Results ◽

Web Pages ◽

Average Precision ◽

Individual User ◽

Search Results ◽

Current Algorithm ◽

Better Than

Current popular search engines are built to serve all users, independent of the needs of any individual user. A personalized query expansion method based on user's historical interested Web pages (UHIWPs) and user’s historical query terms (UHQTs) is proposed in this paper. When a user submits a query keyword to a search engine, the new algorithm can automatically locate the current user’s implicit search intention and compute the term-term associations dynamically according to the user’s UHIWPs and UHQTs. More personalized expansion terms then will be generated and submitted to the search engine together with the query keyword. As a result, different search results can be returned to different users even though they input the same query keywords. Experimental results show that this method is better than the current algorithm in average precision.

Download Full-text

The silent fading of an academic search engine: the case of Microsoft Academic Search

Online Information Review ◽

10.1108/oir-07-2014-0169 ◽

2014 ◽

Vol 38 (7) ◽

pp. 936-953 ◽

Cited By ~ 17

Author(s):

Enrique Orduña-Malea ◽

Alberto Martín-Martín ◽

Juan M. Ayllon ◽

Emilio Delgado López-Cózar

Keyword(s):

Search Engine ◽

Information Seeking ◽

Scientific Information ◽

Reliability And Validity ◽

Closed Model ◽

Content Type ◽

Academic Publications ◽

Microsoft Academic ◽

Construction Model ◽

Academic Search

Purpose – The purpose of this paper is to describe the obsolescence process of Microsoft Academic Search (MAS) as well as the effects of this decline in the coverage of disciplines and journals, and their influence in the representativeness of organizations. Design/methodology/approach – The total number of records and those belonging to the most reputable journals (1,762) and organizations (346) according to the Field Rating indicator in each of the 15 fields and 204 sub-fields of MAS, have been collected and statistically analysed in March 2014, by means of an automated querying process via http, covering academic publications from 1700 to present. Findings – MAS has no longer been updated since 2013, although this phenomenon began to be glimpsed in 2011, when its coverage plummeted. Throughout 2014, indexing of new records is still ongoing, but at a minimum rate, without following any apparent pattern. Research limitations/implications – There are also retrospective records being indexed at present. In this sense, this research provides a picture of what MAS offered during March 2014 being queried directly via http. Practical implications – The unnoticed obsolescence of MAS affects to the quality of the service offered to its users (both those who engage in scientific information seeking and also those who use it for quantitative purposes). Social implications – The predominance of Google Scholar (GS) as monopoly in the academic search engines market as well as the prevalence of an open construction model (GS) vs a closed model (MAS). Originality/value – A complete longitudinal analysis of disciplines, journals and organizations on MAS has been performed for the first time identifying an unnoticed obsolescence. Any public explanation or disclaimer note has been announced from the responsible company, something incomprehensible given its implications for the reliability and validity of bibliometric data provided on disciplines, journals, authors and congress as well as their fair representation on the academic search engine.

Download Full-text

Pembuatan Sistem Pencarian Pekerjaan Menggunakan TF-IDF

Jurnal Ilmiah Teknologi Informasi Asia ◽

10.32815/jitika.v13i2.389 ◽

2019 ◽

Vol 13 (2) ◽

pp. 91

Author(s):

Arif Tirtana ◽

Adnan Zulkarnain ◽

Yohanes Dwi Listio

Keyword(s):

Search Engine ◽

Human Life ◽

Search Process ◽

Job Seekers ◽

System Testing ◽

Search Results ◽

Job Information ◽

Information Work ◽

Over Time

Over time, human life continues to change. Likewise for the context of human work as an activity to fulfill needs. However, in the process of delivering information work is still constrained by the process of delivering information from job providers to job seekers, thus impacting job seekers who have difficulty getting information about job vacancies, as well as the process of registering for jobs in accordance with the wishes of job seekers. From the problems above, we need an update to make it easier for job seekers to find jobs, in this case the search for job vacancies. In this study a search engine was created to make it easier for job seekers to get job information in accordance with the keywords entered by users, using the TF-IDF method. The results of the system testing show that the TF-IDF method is longer in the search process compared to the full query but provides more relevant search results than the full query.

Download Full-text

What users seek and share in online diabetes communities: examining similarities and differences in expressions and themes

Aslib Journal of Information Management ◽

10.1108/ajim-08-2021-0214 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Zhizhen Yao ◽

Bin Zhang ◽

Zhenni Ni ◽

Feicheng Ma

Keyword(s):

Information Sharing ◽

Health Information ◽

Information Seeking ◽

Large Scale ◽

Self Management ◽

Health Information Seeking ◽

Self Disclosure ◽

Content Type ◽

Network Analyses ◽

Similarities And Differences

PurposeThis paper aims to investigate user health information seeking and sharing patterns and content in an online diabetes community and explore the similarities and differences in the ways and themes they expressed.Design/methodology/approachMultiple methods are applied to analyze the expressions and themes that users seek and share based on large-scale text data in an online diabetes community. First, a text classifier using deep learning method is performed based on the expression category this study developed. Second, statistical and social network analyses are used to measure the popularity and compare differences between expressions. Third, topic modeling, manual coding and similarity analysis are used to mining topics and thematic similarity between seeking and sharing threads.FindingsThere are four different ways users seek and share in online health communities (OHCs) including informational seeking, situational seeking, objective information sharing and experiential information sharing. The results indicate that threads with self-disclosure could receive more replies and attract more users to contribute. This study also examines the 10 topics that were discussed for information seeking and 14 topics for information sharing. They shared three discussion themes: self-management, medication and symptoms. Information about symptoms can be largely matched between seeking and sharing threads while there is less overlap in self-management and medication categories.Originality/valueBeing different from previous studies that mainly describe one type of health information behavior, this paper analyzes user health information seeking and sharing behaviors in OHCs and investigates whether there is a correspondence or discrepancy between expressions and information users spontaneously seek and share in OHCs.

Download Full-text