scholarly journals PLeveraging Django and Redis using Web Scraping

2020 ◽  
Vol 9 (1) ◽  
pp. 2103-2105

Web scraping is also known as data scraping and it is used for extracting data from sites. The software used for this may directly access the World Wide Web by using the Hypertext Transfer Protocol or by using a web browser. Over the years, due to advancements in web development and its technology, various frameworks have come in use and almost all of websites are dynamic with their content being served from CMS. This makes it tough to extract data since there is no common template for extracting data. Hence, we use RSS. Rich Site Summary is a kind of timeline allowing users and also applications to gain access to the updates on websites in a standardized, computer-readable format. This project combines the use of RSS to extract data from websites and serve users in a robust and easy way. The differentiation is that this project uses server side caching to serve users almost instantaneously without the need to perform data extraction from the requested site all over again. This is done using Redis and Django.

2020 ◽  
pp. short8-1-short8-9
Author(s):  
Mikhail Ulizko ◽  
Evgeniy Antonov ◽  
Alexey Artamonov ◽  
Rufina Tukumbetova

The paper considers the task of analyzing complex interconnected objects using graph construction. There is no unified tool for constructing graphs. Some solutions can build graphs limited by the number of nodes, while others do not visually display data. The Gephi application was used to construct graphs for the research. Gephi has great functionality for building and analyzing graphs. The subject of research is a politician with a certain set of characteristics. In the paper an algorithm that enables to automate data collection on politicians was developed. One of the main methods of data collecting on the Internet is web scraping. Web scraping software may access the World Wide Web directly using the HTTP, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a software agent. The data was necessary for constructing graphs and their analysis. The use of graphs enables to see various types of relationships, including mediate. This methodology enables to change the attitude towards the analysis of multidi-mensional objects.


1994 ◽  
Vol 05 (05) ◽  
pp. 805-809 ◽  
Author(s):  
SALIM G. ANSARI ◽  
PAOLO GIOMMI ◽  
ALBERTO MICOL

On 3rd November, 1993, ESIS announced its Homepage on the World Wide Web (WWW) to the user community. Ever since then, ESIS has steadily increased its Web support to the astronomical community to include a bibliographic service, the ESIS catalogue documentation and the ESIS Data Browser. More functionality will be added in the near future. All these services share a common ESIS structure that is used by other ESIS user paradigms such as the ESIS Graphical User Interface (Giommi and Ansari, 1993), and the ESIS Command Line Interface. A forms-based paradigm, each ESIS-Web application interfaces to the hypertext transfer protocol (http) translating queries from/to the hypertext markup language (html) format understood by the NCSA Mosaic interface. In this paper, we discuss the ESIS system and show how each ESIS service works on the World Wide Web client.


2015 ◽  
Vol 21 (3) ◽  
pp. 515-531 ◽  
Author(s):  
Hao Li ◽  
Yandong Wang ◽  
Penggen Cheng

Abstract:With the advances in the World Wide Web and Geographic Information System, geospatial services have progressively developed to provide geospatial data and processing functions online. In order to efficiently discover and manage the large amount of geospatial services, these services are registered with semantic descriptions and categorized into classes according to certain taxonomies. Most taxonomies for geospatial services are only provided in the human readable format. The lack of semantic description for taxonomies limits the semantic-based discovery of geospatial services. The objectives of this paper are proposing an approach to semantically describe the taxonomy of geospatial services and using the semantic descriptions for taxonomy to improve the discovery of geospatial services. A semantic description framework is introduced for geospatial service taxonomy to describe not only the hierarchical structure of classes but also the definitions for all classes. The semantic description of taxonomy base on this framework is further used to simplify the semantic description and registration of geospatial services and enhance the semantic-based service matching method.


Author(s):  
August-Wilhelm Scheer

The emergence of what we call today the World Wide Web, the WWW, or simply the Web, dates back to 1989 when Tim Berners-Lee proposed a hypertext system to manage information overload at CERN, Switzerland (Berners-Lee, 1989). This article outlines how his approaches evolved into the Web that drives today’s information society and explores its full potentials still ahead. The formerly known wide-area hypertext information retrieval initiative quickly gained momentum due to the fast adoption of graphical browser programs and standardization activities of the World Wide Web Consortium (W3C). In the beginning, based only on the standards of HTML, HTTP, and URL, the sites provided by the Web were static, meaning the information stayed unchanged until the original publisher decided for an update. For a long time, the WWW, today referred to as Web 1.0, was understood as a technical mean to publish information to a vast audience across time and space. Data was kept locally and Web sites were only occasionally updated by uploading files from the client to the Web server. Application software was limited to local desktops and operated only on local data. With the advent of dynamic concepts on server-side (script languages like hypertext preprocessor (PHP) or Perl and Web applications with JSP or ASP) and client-side (e.g., JavaScript), the WWW became more dynamic. Server-side content management systems (CMS) allowed editing Web sites via the browser during run-time. These systems interact with multiple users through PHP-interfaces that push information into server-side databases (e.g., mySQL) which again feed Web sites with content. Thus, the Web became accessible and editable not only for programmers and “techies” but also for the common user. Yet, technological limitations such as slow Internet connections, consumer-unfriendly Internet rates, and poor multimedia support still inhibited a mass-usage of the Web. It needed broad-band Internet access, flat rates, and digitalized media processing to catch on.


Author(s):  
Xiaoying Gao ◽  
Leon Sterling

The World Wide Web is known as the “universe of network-accessible information, the embodiment of human knowledge” (W3C, 1999). Internet-based knowledge management aims to use the Internet as the world wide environment for knowledge publishing, searching, sharing, reusing, and integration, and to support collaboration and decision making. However, knowledge on the Internet is buried in documents. Most of the documents are written in languages for human readers. The knowledge contained therein cannot be easily accessed by computer programs such as knowledge management systems. In order to make the Internet “machine readable,” information extraction from Web pages becomes a crucial research problem.


2015 ◽  
Vol 23 (3) ◽  
pp. 333-346 ◽  
Author(s):  
Swapan Purkait

Purpose – This paper aims to report on research that tests the effectiveness of anti-phishing tools in detecting phishing attacks by conducting some real-time experiments using freshly hosted phishing sites. Almost all modern-day Web browsers and antivirus programs provide security indicators to mitigate the widespread problem of phishing on the Internet. Design/methodology/approach – The current work examines and evaluates the effectiveness of five popular Web browsers, two third-party phishing toolbar add-ons and seven popular antivirus programs in terms of their capability to detect locally hosted spoofed websites. The same tools have also been tested against fresh phishing sites hosted on Internet. Findings – The experiments yielded alarming results. Although the success rate against live phishing sites was encouraging, only 3 of the 14 tools tested could successfully detect a single spoofed website hosted locally. Originality/value – This work proposes the inclusion of domain name system server authentication and verification of name servers for a visiting website for all future anti-phishing toolbars. It also proposes that a Web browser should maintain a white list of websites that engage in online monetary transactions so that when a user requires to access any of these, the default protocol should always be HTTPS (Hypertext Transfer Protocol Secure), without which a Web browser should prevent the page from loading.


2013 ◽  
Vol 10 (10) ◽  
pp. 2057-2061
Author(s):  
Madhurima Hooda ◽  
Amandeep Kaur ◽  
Madhulika Bhadauria

The World Wide Web is used by millions of people everyday for various purposes including email, reading news, downloading music, online shopping or simply accessing information about anything. Using a standard web browser, the user can access information stored on Web servers situated anywhere on the globe. This gives the illusion that all this information is situated locally on the user’s computer. In reality, the Web represents a huge distributed system that appears as a single resource to the user available at the click of a button. This paper gives an overview of distributed systems in current IT sector. Distributed systems are everywhere. The internet enable users throughout the world to access its services wherever they may be located [1]. Each organization manages an intranet, which provides local services for local users and generally provides services to other users in the internet. Small distributed systems can be constructed from mobile computers and other small computational devices that are attached to a wireless network.


2003 ◽  
Vol 9 (2) ◽  
Author(s):  
J. Nyéki ◽  
M. Soltész

Pál Maliga founded the Hungarian research in floral biology of fruit species during his more than forty-year-long carrier. Almost all pome and stone fruit species have been covered by his activities, but he also dealt with the fertility of walnut and chestnut. Regularities have been revealed and the methodical studies opened the way to approach and elaborate alternatives for the association of varieties in planning high yielding commercial plantations. In his breeding activity the choice of crossing parental varieties was based on the knowledge in fertility relations. The obtained sour cherry varieties represent the world-wide maximum quality, reliability and security of yields. Hungarian renewed sour cherry cultivation owes its fame and prosperity to those varieties, nevertheless also to the radical knowledge of the biological bases of fertility.


2013 ◽  
Vol 28 (2) ◽  
pp. 93-110
Author(s):  
Roger Clarke

The World Wide Web arrived just as connections to the Internet were broadening from academe to the public generally. The Web was designed to support user-performed publishing and access to documents in both textual and graphical forms. That capability was quickly supplemented by means to discover content. The web browser was the ‘killer app’ associated with the explosion of the Internet into the wider world during the mid- 1990s. The technology was developed in 1990 by an Englishman, supported by a Belgian, working in Switzerland, but with the locus soon migrating to Illinois and then to Massachusetts in 1994. Australians were not significant contributors to the original technology, but were among the pioneers in its application. This paper traces the story of the Web in Australia from its beginnings in 1992, up to 1995, identifying key players and what they did, set within the broader context, and reflecting the insights of the theories of innovation and innovation diffusion.


Sign in / Sign up

Export Citation Format

Share Document