Search Engine as a mediating technology of organization

Author(s):  
Renée Ridgway

Search engines have become the technological and organizational means to navigate, filter, and rank online information for users. During the seventeenth to nineteenth centuries in Europe, the ‘pre-history’ of search engines were the ‘bureau d’adresse’ or ‘address office’ that provided information and services to clients as they gathered data. Registers, censuses, and archives eventually shifted to relational databases owned by commercial platforms, advertising agencies cum search engines that provide non-neutral answers in exchange for user data. With ‘cyberorganization’, personalized advertisement, machine-learning algorithms, and ‘surveillance capitalism’ organize the user through their ‘habit’ of search. However, there are alternatives such as the p2p search engine YaCy and anonymity browsing with Tor.

2021 ◽  
Vol 13 (1) ◽  
pp. 9
Author(s):  
Goran Matošević ◽  
Jasminka Dobša ◽  
Dunja Mladenić

This paper presents a novel approach of using machine learning algorithms based on experts’ knowledge to classify web pages into three predefined classes according to the degree of content adjustment to the search engine optimization (SEO) recommendations. In this study, classifiers were built and trained to classify an unknown sample (web page) into one of the three predefined classes and to identify important factors that affect the degree of page adjustment. The data in the training set are manually labeled by domain experts. The experimental results show that machine learning can be used for predicting the degree of adjustment of web pages to the SEO recommendations—classifier accuracy ranges from 54.59% to 69.67%, which is higher than the baseline accuracy of classification of samples in the majority class (48.83%). Practical significance of the proposed approach is in providing the core for building software agents and expert systems to automatically detect web pages, or parts of web pages, that need improvement to comply with the SEO guidelines and, therefore, potentially gain higher rankings by search engines. Also, the results of this study contribute to the field of detecting optimal values of ranking factors that search engines use to rank web pages. Experiments in this paper suggest that important factors to be taken into consideration when preparing a web page are page title, meta description, H1 tag (heading), and body text—which is aligned with the findings of previous research. Another result of this research is a new data set of manually labeled web pages that can be used in further research.


Author(s):  
Suely Fragoso

This chapter proposes that search engines apply a verticalizing pressure on the WWW many-to-many information distribution model, forcing this to revert to a distributive model similar to that of the mass media. The argument for this starts with a critical descriptive examination of the history of search mechanisms for the Internet. Parallel to this there is a discussion of the increasing ties between the search engines and the advertising market. The chapter then presents questions concerning the concentration of traffic on the Web around a small number of search engines which are in the hands of an equally limited number of enterprises. This reality is accentuated by the confidence that users place in the search engine and by the ongoing acquisition of collaborative systems and smaller players by the large search engines. This scenario demonstrates the verticalizing pressure that the search engines apply to the majority of WWW users, that bring it back toward the mass distribution mode.


2010 ◽  
Vol 55 (2) ◽  
pp. 374-386
Author(s):  
Joan Miquel-Vergés ◽  
Elena Sánchez-Trigo

The use of the Internet as a source of health information is greatly increasing. However, identifying relevant and valid information can be problematic. This paper firstly analyses the efficiency of Internet search engines specialized in health in order to then determine the quality of the online information related to a specific medical subdomain like that of neuromuscular diseases. Our aim is to present a model for the development and use of a bilingual electronic corpus (MYOCOR), related to the said neuromuscular diseases in order to: a) on one hand, provide a quality health information tool for health professionals, patients and relatives, as well as for translators and writers of specialized texts, and software developers, and b) on the other hand, use the same as a base for the implementation of a search engine (using keywords and semantics), like the ASEM (Federación Española Contra las Enfermedades Neuromusculares) search engine for neuromuscular diseases.


2021 ◽  
Vol 1 (1) ◽  
pp. 29-35
Author(s):  
Ismail Majid

Abstrak Sistem Pencarian merupakan aplikasi penting diterapkan pada sebuah media informasi online, namun sejak hadirnya mesin pencari seperti Google orang lebih suka menggunakan alat ini untuk menemukan informasi. Karena metode pencarian yang digunakan terbukti keandalannya. Apakah kita mampu seperti itu? Penelitian ini membuktikan bahwa dengan menerapkan metode Google Custom Search API, kita dapat membangun sistem pencarian layaknya seperti mesin pencari Google, hasil pengujian menunjukkan hasil pencarian yang ditampilkan sangat relevan dan rata-rata berada pada peringkat pertama. Keuntungan lainnya metode ini dilengkapi koreksi ejaan salah untuk menyempurnakan kata kunci sebenarnya.   Abstract Search system is an important application applied to an online information media, but since the presence of search engines like Google, people prefer to use this tool to find information. Because the search method used is proven to be reliable. Are we able to be like that? This research proves that by implementing the Google Custom Search API method, we can build a search system like Google's search engine, the test results show that the search results displayed are very relevant and on average are ranked first. Another advantage of this method is that it includes incorrect spelling corrections to perfect the actual keywords.


2021 ◽  
Author(s):  
Felipe Cujar-Rosero ◽  
David Santiago Pinchao Ortiz ◽  
Silvio Ricardo Timaran Pereira ◽  
Jimmy Mateo Guerrero Restrepo

This paper presents the final results of the research project that aimed to build a Semantic Search Engine that uses an Ontology and a model trained with Machine Learning to support the semantic search of research projects of the System of Research from the University of Nariño. For the construction of FENIX, as this Engine is called, it was used a methodology that includes the stages: appropriation of knowledge, installation and configuration of tools, libraries and technologies, collection, extraction and preparation of research projects, design and development of the Semantic Search Engine. The main results of the work were three: a) the complete construction of the Ontology with classes, object properties (predicates), data properties (attributes) and individuals (instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the successful training of the model for which Machine Learning algorithms and specifically Natural Language Processing algorithms were used such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also done in Jupyter Notebook with Python within the virtual environment of anaconda and with Elasticsearch; and c) the creation of FENIX managing and unifying the queries for the Ontology and for the Machine Learning model. The tests showed that FENIX was successful in all the searches that were carried out because its results were satisfactory.


Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.


2019 ◽  
Author(s):  
Jingchun Fan ◽  
Jean Craig ◽  
Na Zhao ◽  
Fujian Song

BACKGROUND Increasingly people seek health information from the Internet, in particular, health information on diseases that require intensive self-management, such as diabetes. However, the Internet is largely unregulated and the quality of online health information may not be credible. OBJECTIVE To assess the quality of online information on diabetes identified from the Internet. METHODS We used the single term “diabetes” or equivalent Chinese characters to search Google and Baidu respectively. The first 50 websites retrieved from each of the two search engines were screened for eligibility using pre-determined inclusion and exclusion criteria. Included websites were assessed on four domains: accessibility, content coverage, validity and readability. RESULTS We included 26 websites from Google search engine and 34 from Baidu search engine. There were significant differences in website provider (P<0.0001), but not in targeted population (P=0.832) and publication types (P=0.378), between the two search engines. The website accessibility was not statistically significantly different between the two search engines, although there were significant differences in items regarding website content coverage. There was no statistically significant difference in website validity between the Google and Baidu search engines (mean Discern score 3.3 vs 2.9, p=0.156). The results to appraise readability for English website showed that that Flesch Reading Ease scores ranged from 23.1 to 73.0 and the mean score of Flesch-Kincaid Grade Level ranged range from 5.7 to 19.6. CONCLUSIONS The content coverage of the health information for patients with diabetes in English search engine tended to be more comprehensive than that from Chinese search engine. There was a lack of websites provided by health organisations in China. The quality of online health information for people with diabetes needs to be improved to bridge the knowledge gap between website service and public demand.


Author(s):  
Fatama Sharf Al-deen ◽  
Fadl Mutaher Ba-Alwi

Due to the rapid development in information technology, Big Data has become one of its prominent feature that had a great impact on other technologies dealing with data such as machine learning technologies. K-mean is one of the most important machine learning algorithms. The algorithm was first developed as a clustering technology dealing with relational databases. However, the advent of Big Data has highly effected its performance. Therefore, many researchers have proposed several approaches to improve K-mean accuracy in Big Data environment. In this paper, we introduce a literature review about different technologies proposed for k-mean algorithm development in Big Data. We demonstrate a comparison between them according to several criteria, including the proposed algorithm, the database used, Big Data tools, and k-mean applications. This paper helps researchers to see the most important challenges and trends of the k-mean algorithm in the Big Data environment.


Sign in / Sign up

Export Citation Format

Share Document