scholarly journals PENERAPAN TEKNIK WEB SCRAPING UNTUK PENGGALIAN POTENSI PAJAK (STUDI KASUS PADA ONLINE MARKET PLACE TOKOPEDIA, SHOPEE DAN BUKALAPAK)

2020 ◽  
Vol 13 (2) ◽  
pp. 65-75
Author(s):  
Mohammad Djufri

Currently, millions of transaction data are avaliable on the internet, which can be retrieved and analyzed for excavating potential taxes. This article aims to examine whether the search data through web scraping techniques can be applied in an attempt to excavate the potential tax by the Account Representative. This paper uses an informetric approach, which will be examined quantitative information in the form of transaction data of sellers recorded on the three online marketplace (OMP) namely Tokopedia, Shopee and Bukalapak. The results show that web scraping techniques can be used for extracting potential taxes, and the best web scraping technique that can be done by the Directorate General of Taxation (DJP) is to develop its own integrated web scraping application as a Business Intelligence system. The results of this research are expected to contribute academically in the form of the use of web scraping in data extraction for the excavation of potential taxes and policy implications in terms of data search through the internet by the Directorate General of Taxation

2020 ◽  
pp. 5-9
Author(s):  
Manasvi Srivastava ◽  
◽  
Vikas Yadav ◽  
Swati Singh ◽  
◽  
...  

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.


2021 ◽  
Vol 1 (3) ◽  
pp. 58-60
Author(s):  
Katanakal Sarada ◽  
◽  
Dr. K. Nirmalamma ◽  
◽  

Mobile commerce is the buying and selling of goods and Services through wireless handled devices such as smart phones and tablets etc. Ecommerce Users to access M-commerce enables online shopping platforms without needing to use & a desktop computer. For example, purchase and sale of products. Online like banking and paying bills. (Virtual market place apps the Amazon mobile App, Android pay, Samsung pay etc...) The main idea behind M. commerce Is to enable various applications and services available on the internet to portable devices (mobiles, laptops, tables etc.) to overcome the constraints of a desktop computer. M commerce aims Serve all information and material needs of the people in a convenient and easy way.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Irvin Dongo ◽  
Yudith Cardinale ◽  
Ana Aguilera ◽  
Fabiola Martinez ◽  
Yuni Quintero ◽  
...  

Purpose This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.


2021 ◽  
Vol 1 (2) ◽  
pp. 65-77
Author(s):  
T. E. Vildanov ◽  
◽  
N. S. Ivanov ◽  

This article explores both popular and newly invented tools for extracting data from sites and converting them into a form suitable for analysis. The paper compares the Python libraries, the key criterion of the compared tools is their performance. The results will be grouped by sites, tools used and number of iterations, and then presented in graphical form. The scientific novelty of the research lies in the field of application of data extraction tools: we will receive and transform semistructured data from the websites of bookmakers and betting exchanges. The article also describes new tools that are currently not in great demand in the field of parsing and web scraping. As a result of the study, quantitative metrics were obtained for all the tools used and the libraries that were most suitable for the rapid extraction and processing of information in large quantities were selected.


2011 ◽  
Vol 107 (3) ◽  
pp. 350-359 ◽  
Author(s):  
Le Ma ◽  
Hong-Liang Dou ◽  
Yi-Qun Wu ◽  
Yang-Mu Huang ◽  
Yu-Bei Huang ◽  
...  

Lutein and zeaxanthin are thought to decrease the incidence of age-related macular degeneration (AMD); however, findings have been inconsistent. We conducted a systematic literature review and meta-analysis to evaluate the relationship between dietary intake of lutein and zeaxanthin and AMD risk. Relevant studies were identified by searching five databases up to April 2010. Reference lists of articles were retrieved, and experts were contacted. Literature search, data extraction and study quality assessment were performed independently by two reviewers and results were pooled quantitatively using meta-analysis methods. The potential sources of heterogeneity and publication bias were also estimated. The search yielded six longitudinal cohort studies. The pooled relative risk (RR) for early AMD, comparing the highest with the lowest category of lutein and zeaxanthin intake, was 0·96 (95 % CI 0·78, 1·17). Dietary intake of these carotenoids was significantly related with a reduction in risk of late AMD (RR 0·74; 95 % CI 0·57, 0·97); and a statistically significant inverse association was observed between lutein and zeaxanthin intake and neovascular AMD risk (RR 0·68; 95 % CI 0·51, 0·92). The results were essentially consistent among subgroups stratified by participant characteristics. The findings of the present meta-analysis indicate that dietary lutein and zeaxanthin is not significantly associated with a reduced risk of early AMD, whereas an increase in the intake of these carotenoids may be protective against late AMD. However, additional studies are needed to confirm these relationships.


Author(s):  
Savinay Mengi ◽  
Astha Gupta

A Blockchain protocol operates on top of the Internet, on a P2P network of computers that all run the protocol and hold an identical copy of the ledger of transactions, enabling P2P value transactions without a middleman though machine consensus. The concept of Blockchain first came to fame in October 2008, as part of a proposal for Bitcoin, with the aim to create P2P money without banks. Bitcoin introduced a novel solution to the age-old human problem of trust. The underlying blockchain technology allows us to trust the outputs of the system without trusting any actor within it. People and institutions who do not know or trust each other, reside in different countries, are subject to different jurisdictions, and who have no legally binding agreements with each other, can now interact over the Internet without the need for trusted third parties like banks, Internet platforms, or other types of clearing institutions. Ideas around cryptographically secured P2P networks have been discussed in the academic environment in different evolutionary stages, mostly in theoretical papers, since the 1980s. “Proof-of-Work” is the consensus mechanism that enables distributed control over the ledger. It is based on a combination of economic incentives and cryptography. Blockchain is a shared, trusted, public ledger of transactions, that everyone can inspect but which no single user controls. It is a distributed database that maintains a continuously growing list of transaction data records, cryptographically secured from tampering and revision.


2018 ◽  
Vol 2 (02) ◽  
Author(s):  
Julycia Verent Manderos ◽  
I Gede Suwetja

The Directorate General of Taxation currently utilizes internet technology to improve service, one of which is by conducting an online registration of NPWP or e-Registration so that new taxpayers are easier to register anywhere and anytime. But in this case there are things that hinder the process. Based on the research there are several factors that hinder the e-Registration process, such as: (1) Data received by the Extensification Section is incomplete, (2) The internet network are often disrupted, (3) The lack of public attention to the socialization that has been carried out. The author suggests for KPP Pratama Manado to re-socializing about e-Registration, improving the quality of the socialization and improving the internet network.Keyword : E-Registration, NPWP, Taxpayers, Socialization, Public Attention, Internet Network


2018 ◽  
Author(s):  
David D. Clark ◽  
Amogh D. Dhamdhere ◽  
Matthew Luckie ◽  
KC C. Claffy

2005 ◽  
Author(s):  
Florian Zettelmeyer ◽  
Fiona Scott Morton ◽  
Jorge Silva-Risso

Sign in / Sign up

Export Citation Format

Share Document