scholarly journals Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Author(s):  
Moaiad Khder

Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.

Author(s):  
Zhaohao Sun ◽  
Andrew Stranieri

Intelligent analytics is an emerging paradigm in the age of big data, analytics, and artificial intelligence (AI). This chapter explores the nature of intelligent analytics. More specifically, this chapter identifies the foundations, cores, and applications of intelligent big data analytics based on the investigation into the state-of-the-art scholars' publications and market analysis of advanced analytics. Then it presents a workflow-based approach to big data analytics and technological foundations for intelligent big data analytics through examining intelligent big data analytics as an integration of AI and big data analytics. The chapter also presents a novel approach to extend intelligent big data analytics to intelligent analytics. The proposed approach in this chapter might facilitate research and development of intelligent analytics, big data analytics, business analytics, business intelligence, AI, and data science.


2018 ◽  
Vol 15 (3) ◽  
pp. 497-498 ◽  
Author(s):  
Ruth C. Carlos ◽  
Charles E. Kahn ◽  
Safwan Halabi

Web Services ◽  
2019 ◽  
pp. 728-744 ◽  
Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.


10.2196/16607 ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. e16607 ◽  
Author(s):  
Christian Lovis

Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and a mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together.


Author(s):  
Mahyuddin K. M. Nasution Et.al

In the era of information technology, the two developing sides are data science and artificial intelligence. In terms of scientific data, one of the tasks is the extraction of social networks from information sources that have the nature of big data. Meanwhile, in terms of artificial intelligence, the presence of contradictory methods has an impact on knowledge. This article describes an unsupervised as a stream of methods for extracting social networks from information sources. There are a variety of possible approaches and strategies to superficial methods as a starting concept. Each method has its advantages, but in general, it contributes to the integration of each other, namely simplifying, enriching, and emphasizing the results.


Author(s):  
Thomas M. Powers ◽  
Jean-Gabriel Ganascia

This chapter discusses several challenges for doing the ethics of artificial intelligence (AI). The challenges fall into five major categories: conceptual ambiguities within philosophy and AI scholarship; the estimation of AI risks; implementing machine ethics; epistemic issues of scientific explanation and prediction in what can be called computational data science (CDS), which includes “big data” science; and oppositional versus systemic ethics approaches. The chapter then argues that these ethical problems are not likely to yield to the “common approaches” of applied ethics. Primarily due to the transformational nature of artificial intelligence within science, engineering, and human culture, novel approaches will be needed to address the ethics of AI in the future. Moreover, serious barriers to the formalization of ethics will be needed to overcome to implement ethics in AI.


Author(s):  
Zhaohao Sun

Intelligent big data analytics is an emerging paradigm in the age of big data, analytics, and artificial intelligence (AI). This chapter explores intelligent big data analytics from a managerial perspective. More specifically, it first looks at the age of trinity and argues that intelligent big data analytics is at the center of the age of trinity. This chapter then proposes a managerial framework of intelligent big data analytics, which consists of intelligent big data analytics as a science, technology, system, service, and management for improving business decision making. Then it examines intelligent big data analytics for management taking into account four managerial functions: planning, organizing, leading, and controlling. The proposed approach in this chapter might facilitate the research and development of intelligent big data analytics, big data analytics, business intelligence, artificial intelligence, and data science.


Author(s):  
Cornelius J. P. Niemand ◽  
Kelvin Joseph Bwalya

The growth of the information management (IM) discipline and its importance in different socio-economic platforms cannot be over-emphasized. The current development of heterogeneous technologies shows that IM is the focal point of innovations such as blockchain, data science (big data, predictive analytics, etc.), artificial intelligence, automation, etc. This research was motivated by a desire to contribute towards establishing the intellectual identity of IM as a science and as a discipline. An exploration of the inventory of theories and conceptual frameworks enables us to have an understanding of the different methodologies currently being used and therefore define the level of development of the field as a discipline. This chapter aims to present the patterns and trends in theory conducted by different studies during the last 10 years (2009 – 2019). Using a bibliometric approach anchored on descriptive informetrics, the chapter explores the application of theory within the IM field.


Sign in / Sign up

Export Citation Format

Share Document