Big Data Analysis of Web Data Extraction

In this study, the large data extraction techniques; include detection of patterns and secret relationships between factors numbering and bring in the required information. Rapid analysis of massive data can lead to innovation and concepts of the theoretical value. Compared with results from mining between traditional data sets and the vast amount of large heterogeneous data interdependent it has the ability expand the knowledge and ideas about the target domain. We studied in this research data mining on the Internet. The various networks that are used to extract data onto different locations complex may appear sometimes and has been used to extract information on the web technology to extract and data analysis (Marwah et al., 2016). In this research, we extracted the information on large quantities of the web pages and examined the pages of the site using Java code, and we added the extracted information on a special database for the web page. We used the data network function to get accurate results of evaluating and categorizing the data pages found, which identifies the trusted web or risky web pages, and imported the data onto a CSV extension. Consequently, examine and categorize these data using WEKA to obtain accurate results. We concluded from the results that the applied data mining algorithms are better than other techniques in classification and extraction of data and high performance.

Download Full-text

A Prediction System for Diagnosis of Diabetes Mellitus

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8701 ◽

2020 ◽

Vol 17 (1) ◽

pp. 6-9

Author(s):

Ramya G. Franklin ◽

B. Muthukumar

Keyword(s):

Data Mining ◽

Large Data ◽

Heterogeneous Data ◽

Common Disease ◽

Prediction System ◽

Health Issues ◽

Data Set ◽

Data Mining Algorithms ◽

Growth Of Science ◽

Mining Algorithms

The growth of Science is a priceless asset to the human and society. The plethora of high-end machines has made life a sophistication which in turn is paid back as health issues. The health care data are complex and large. This heterogeneous data are used to diagnose patient’s diseases. It is better to predict the diseases at an earlier stage that can save the life and also have an upper hand in controlling the diseases. Data mining approaches are very useful in analyzing the complex, heterogeneous and large data set. The mining algorithms extract the essential data set from the raw data. This paper presents a survey on the various data mining algorithms used in predicting a very common disease in day a today life “Diabetics Mellitus.” Over 246 million people in the world are diabetic with a majority of them being women. The WHO reports that by 2025 this number is expected to rise to over 380 million.

Download Full-text

COMPARISON OF CLASSIFICATION ALGORITHMS TO DETECT PHISHING WEB PAGES USING FEATURE SELECTION AND EXTRACTION

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v4.i8.2016.2570 ◽

2016 ◽

Vol 4 (8) ◽

pp. 118-135

Author(s):

Rajendra Gupta

Keyword(s):

Data Mining ◽

Fast Response ◽

System Model ◽

Classification Algorithms ◽

Web Pages ◽

Bayesian Algorithm ◽

Web Browser ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

The Web

The phishing is a kind of e-commerce lure which try to steal the confidential information of the web user by making identical website of legitimate one in which the contents and images almost remains similar to the legitimate website with small changes. Another way of phishing is to make minor changes in the URL or in the domain of the legitimate website. In this paper, a number of anti-phishing toolbars have been discussed and proposed a system model to tackle the phishing attack. The proposed anti-phishing system is based on the development of the Plug-in tool for the web browser. The performance of the proposed system is studied with three different data mining classification algorithms which are Random Forest, Nearest Neighbour Classification (NNC), Bayesian Classifier (BC). To evaluate the proposed anti-phishing system for the detection of phishing websites, 7690 legitimate websites and 2280 phishing websites have been collected from authorised sources like APWG database and PhishTank. After analyzing the data mining algorithms over phishing web pages, it is found that the Bayesian algorithm gives fast response and gives more accurate results than other algorithms.

Download Full-text

A FRAME WORK FOR WEB INFORMATION EXTRACTION AND ANALYSIS

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v7i2.3459 ◽

2013 ◽

Vol 7 (2) ◽

pp. 574-579 ◽

Cited By ~ 3

Author(s):

Dr Sunitha Abburu ◽

G. Suresh Babu

Keyword(s):

Information Extraction ◽

Data Extraction ◽

Research Work ◽

Web Pages ◽

Web Documents ◽

E Learning ◽

Structured Information ◽

Frame Work ◽

Effective Decision ◽

The Web

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.Â But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies Â data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.Â Â It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.Â The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.

Download Full-text

A Pattern Storage System using Pattern Warehouse along with Sources of Pattern Generation and Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1063.08810s19 ◽

2019 ◽

Vol 8 (10S) ◽

pp. 357-362

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Storage System ◽

Critical Evaluation ◽

Large Data ◽

Vital Role ◽

Pattern Generation ◽

Data Repository ◽

Storage Unit ◽

Data Mining Algorithms

Now a day different data mining algorithms are ready to create the specific set of data known as Pattern from a huge data repository, but there is no infrastructure or system to save it as persistent storage for the generated patterns. Pattern warehouse presents a foundation to make these patterns safe in the specific environment for long term use. Most organizations are excited to know the information or patterns rather than raw data or group of unprocessed data. Because extracted knowledge play a vital role to take right decision for the growth of an organization. We have examined the sources of patterns generated from large data sets. In this paper, we have presented little importance on the application area of pattern and idea of patter warehouse, the architecture of pattern warehouse then correlation between data warehouse and data mining, association between data mining and pattern warehouse, critical evaluation between existing approaches which theoretically published and more stress on association rule related review elements. In this paper, we analyze the patterns warehouse, data warehouse concerning various factors like storage space, type of storage unit, characteristics, and provide several research domains.

Download Full-text

TCMiner: A High Performance Data Mining System for Multi-dimensional Data Analysis of Traditional Chinese Medicine Prescriptions

Lecture Notes in Computer Science - Conceptual Modeling for Advanced Application Domains ◽

10.1007/978-3-540-30466-1_23 ◽

2004 ◽

pp. 246-257 ◽

Cited By ~ 9

Author(s):

Chuan Li ◽

Changjie Tang ◽

Jing Peng ◽

Jianjun Hu ◽

Lingming Zeng ◽

...

Keyword(s):

Data Mining ◽

Data Analysis ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

High Performance ◽

Performance Data ◽

Mining System ◽

Data Mining System

Download Full-text

Exploiting Enriched Knowledge of Web Network Structures

Enhancing Qualitative and Mixed Methods Research with Technology - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-4666-6493-7.ch011 ◽

2015 ◽

pp. 255-286

Author(s):

Shalin Hai-Jew

Keyword(s):

Social Media ◽

Data Extraction ◽

Web Pages ◽

Network Structures ◽

Social Media Platform ◽

Special Software ◽

Testing Tool ◽

Media Platform ◽

Data Visualizations ◽

The Web

Understanding Web network structures may offer insights on various organizations and individuals. These structures are often latent and invisible without special software tools; the interrelationships between various websites may not be apparent with a surface perusal of the publicly accessible Web pages. Three publicly available tools may be “chained” (combined in sequence) in a data extraction sequence to enable visualization of various aspects of http network structures in an enriched way (with more detailed insights about the composition of such networks, given their heterogeneous and multimodal contents). Maltego Tungsten™, a penetration-testing tool, enables the mapping of Web networks, which are enriched with a variety of information: the technological understructure and tools used to build the network, some linked individuals (digital profiles), some linked documents, linked images, related emails, some related geographical data, and even the in-degree of the various nodes. NCapture with NVivo enables the extraction of public social media platform data and some basic analysis of these captures. The Network Overview, Discovery, and Exploration for Excel (NodeXL) tool enables the extraction of social media platform data and various evocative data visualizations and analyses. With the size of the Web growing exponentially and new domains (like .ventures, .guru, .education, .company, and others), the ability to map widely will offer a broad competitive advantage to those who would exploit this approach to enhance knowledge.

Download Full-text

Intelligent Data Analysis

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch015 ◽

2011 ◽

pp. 308-314 ◽

Cited By ~ 1

Author(s):

Xiaohui Liu

Keyword(s):

Data Analysis ◽

High Performance ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Intelligent Data Analysis ◽

Statistical Knowledge ◽

Interdisciplinary Study ◽

Performance Computing ◽

Effective Analysis

Intelligent Data Analysis (IDA) is an interdisciplinary study concerned with the effective analysis of data. IDA draws the techniques from diverse fields, including artificial intelligence, databases, high-performance computing, pattern recognition, and statistics. These fields often complement each other (e.g., many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge) (Berthold & Hand 2003; Liu, 1999).

Download Full-text

Effectiveness of Web Usage Mining Techniques in Business Application

Advances in Data Mining and Database Management - Web Usage Mining Techniques and Applications Across Industries ◽

10.4018/978-1-5225-0613-3.ch013 ◽

2017 ◽

pp. 324-350 ◽

Cited By ~ 2

Author(s):

Ahmed El Azab ◽

Mahmood A. Mahmood ◽

Abd El-Aziz

Keyword(s):

Data Mining ◽

Academic Research ◽

Web Usage Mining ◽

Web Pages ◽

Web Data ◽

Web Data Mining ◽

Web Usage ◽

Business Application ◽

Common Interests ◽

The Web

Web usage mining techniques and applications across industries is still exploratory and, despite an increase in academic research, there are challenge of analyze web which quantitatively capture web users' common interests and characterize their underlying tasks. This chapter addresses the problem of how to support web usage mining techniques and applications across industries by combining language of web pages and algorithms that used in web data mining. Existing research in web usage mining techniques tend to focus on finding out how each techniques can apply in different industries fields. However, there is little evidence that researchers have approached the issue of web usage mining across industries. Consequently, the aim of this chapter is to provide an overview of how the web usage mining techniques and applications across industries can be supported.

Download Full-text

Personalized web search on e-commerce using ontology based association mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9487 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 286

Author(s):

B. Sekhar Babu ◽

P. Lakshmi Prasanna ◽

P. Vidyullatha

Keyword(s):

Data Mining ◽

Web Search ◽

Large Data ◽

Association Mining ◽

Data Sets ◽

Data Mining Algorithm ◽

Web Data ◽

Data Mining Technique ◽

Web Data Mining ◽

The Web

In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.

Download Full-text

Cloud for Distributed Data Analysis Based on the Actor Model

Scientific Programming ◽

10.1155/2016/1050293 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Ivan Kholod ◽

Ilya Petukhov ◽

Andrey Shorov

Keyword(s):

Data Mining ◽

Data Analysis ◽

Data Sets ◽

Distributed Data ◽

Actor Model ◽

Confidential Information ◽

Data Mining Algorithms ◽

Functional Blocks ◽

Mining Algorithms ◽

Distributed Data Analysis

This paper describes the construction of a Cloud for Distributed Data Analysis (CDDA) based on the actor model. The design uses an approach to map the data mining algorithms on decomposed functional blocks, which are assigned to actors. Using actors allows users to move the computation closely towards the stored data. The process does not require loading data sets into the cloud and allows users to analyze confidential information locally. The results of experiments show that the efficiency of the proposed approach outperforms established solutions.

Download Full-text