Efficient web data classification techniques using semi-supervise learning algorithm

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.

Download Full-text

Performance evaluation of different classification techniques using different datasets

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i5.pp3584-3590 ◽

2019 ◽

Vol 9 (5) ◽

pp. 3584

Author(s):

Abdulkadir Özdemir ◽

Uğur Yavuz ◽

Fares Abdulhafidh Dael

Keyword(s):

Data Mining ◽

Business Intelligence ◽

Multilayer Perceptron ◽

Data Classification ◽

Major Effect ◽

Classification Techniques ◽

The Difference ◽

Two Parameters ◽

Accuracy Performance

<span>Nowadays data mining become one of the technologies that paly major effect on business intelligence. However, to be able to use the data mining outcome the user should go through many process such as classified data. Classification of data is processing data and organize them in specific categorize to be use in most effective and efficient use. In data mining one technique is not applicable to be applied to all the datasets. This paper showing the difference result of applying different techniques on the same data. This paper evaluates the performance of different classification techniques using different datasets. In this study four data classification techniques have chosen. They are as follow, BayesNet, NaiveBayes, Multilayer perceptron and J48. The selected data classification techniques performance tested under two parameters, the time taken to build the model of the dataset and the percentage of accuracy to classify the dataset in the correct classification. The experiments are carried out using Weka 3.8 software. The results in the paper demonstrate that the efficiency of Multilayer Perceptron classifier in overall the best accuracy performance to classify the instances, and NaiveBayes classifiers were the worst outcome of accuracy to classifying the instance for each dataset.</span>

Download Full-text

Big Data classification: techniques and tools

Applications of Big Data in Healthcare ◽

10.1016/b978-0-12-820203-6.00002-3 ◽

2021 ◽

pp. 1-43

Author(s):

Pijush Kanti Dutta Pramanik ◽

Saurabh Pal ◽

Moutan Mukhopadhyay ◽

Simar Preet Singh

Keyword(s):

Big Data ◽

Data Classification ◽

Classification Techniques ◽

Big Data Classification

Download Full-text

MINING WEB USAGE GRAPHS USING EXAMPLE SEARCH SPACE

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902680200049x ◽

2002 ◽

Vol 02 (02) ◽

pp. 209-220 ◽

Cited By ~ 1

Author(s):

V. UMA MAHESWARI ◽

A. SIROMONEY ◽

K. M. MEHATA

Keyword(s):

Web Mining ◽

Learning Algorithm ◽

Search Space ◽

Web Data ◽

Web Usage ◽

Navigation Patterns ◽

Example Space

Web mining refers to the process of discovering potentially useful and previously unknown information or knowledge from web data. A graph-based framework is used for classifying Web users based on their navigation patterns. GOLEM is a learning algorithm that uses the example space to restrict the solution search space. In this paper, this algorithm is modified for the graph-based framework. GOLEM is appropriate in this application where the solution search space is very large. An experimental illustration is presented.

Download Full-text

Performance Evaluation System for Decision Tree Algorithms

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v11i8.3006 ◽

2013 ◽

Vol 11 (8) ◽

pp. 2879-2886

Author(s):

Deepali Saini ◽

Prof. Anand Rajavat

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Evaluation System ◽

Learning Algorithm ◽

Data Mining Algorithm ◽

Mining Algorithm ◽

Supervise Learning Algorithm ◽

Build Time ◽

Study Implementation ◽

Tree Algorithms

In the machine learning process, classification can be described by supervise learning algorithm. Classification techniques have properties that enable the representation of structures that reflect knowledge of the domain being classified. Industries, education, business and many other domains required knowledge for the growth.Â Some of the common classification algorithms used in data mining and decision support systems is: Neural networks, Logistic regression, Decision trees etc. The decision regarding most suitable data mining algorithm cannot be made spontaneously. Selection of appropriate data mining algorithm for Business domain required comparative analysis of different algorithms based on several input parameters such as accuracy, build time and memory usage.To make analysis and comparative study, implementation of popular algorithm required on the basis of literature survey and frequency of algorithm used in present scenario. The performance of algorithms are enhanced and evaluated after applying boosting on the trees. We selected numerical and nominal types of dataset and apply on algorithms. Comparative analysis is perform on the result obtain by the system. Then we apply the new dataset in order to generate generate prediction outcome.

Download Full-text

Rough set based ensemble learning algorithm for agricultural data classification

Filomat ◽

10.2298/fil1805917s ◽

2018 ◽

Vol 32 (5) ◽

pp. 1917-1930 ◽

Cited By ~ 1

Author(s):

Lei Shi ◽

Qiguo Duan ◽

Juanjuan Zhang ◽

Lei Xi ◽

Hongbo Qiao ◽

...

Keyword(s):

Ensemble Learning ◽

Rough Set ◽

Rough Set Theory ◽

Learning Algorithm ◽

Uncertain Data ◽

Data Classification ◽

Research Area ◽

Experimental Comparison ◽

Classification Problems ◽

Ensemble Learning Algorithm

Agricultural data classification attracts more and more attention in the research area of intelligent agriculture. As a kind of important machine learning methods, ensemble learning uses multiple base classifiers to deal with classification problems. The rough set theory is a powerful mathematical approach to process unclear and uncertain data. In this paper, a rough set based ensemble learning algorithm is proposed to classify the agricultural data effectively and efficiently. An experimental comparison of different algorithms is conducted on four agricultural datasets. The results of experiment indicate that the proposed algorithm improves performance obviously.

Download Full-text

A Machine Learning Algorithm TsF K-NN Based on Automated Data Classification for Securing Mobile Cloud Computing Model

2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS) ◽

10.1109/ccoms.2019.8821756 ◽

2019 ◽

Author(s):

Anunaya Inani ◽

Chakradhar Verma ◽

Suvrat Jain

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Mobile Cloud Computing ◽

Learning Algorithm ◽

Data Classification ◽

Mobile Cloud ◽

Machine Learning Algorithm ◽

Computing Model

Download Full-text

Sensor data classification using machine learning algorithm

Journal of Statistics and Management Systems ◽

10.1080/09720510.2020.1736319 ◽

2020 ◽

Vol 23 (2) ◽

pp. 363-371

Author(s):

Lina Rose ◽

X Anitha Mary

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Data Classification ◽

Sensor Data ◽

Machine Learning Algorithm

Download Full-text

One Method to Reduce Data Classification Using Weighting Technique in SVM +

Modern Applied Science ◽

10.5539/mas.v10n9p245 ◽

2016 ◽

Vol 10 (9) ◽

pp. 245

Author(s):

Arash Ghorban Niya Delavar ◽

Zahra Jafari

Keyword(s):

Processing Speed ◽

Learning Algorithm ◽

Data Classification ◽

Threshold Detector ◽

Real Time Processing ◽

Data Volume ◽

Optimum Response ◽

Weighting Technique ◽

Analyze Data

SVM, a learning algorithm to analyze data and recognize patterns is used. But there is an important issue, replicate data as well as its real-time processing has not been correctly calculated. For this reason, in this paper we have provided a method DCSVM+ to reduce data classification using weighting technique in SVM +. The proposed method with regard to the parameters to SVM + has the optimum response time. By observing the parameter of data volume and their density, we abled to classify the size of interval as case that this classification to investigated case study reduces the running time of algorithm SVM +. Also by providing objective function of the proposed method, we abled to reduce replicate data to SVM + by integrating parameters and data classification and finally we provided threshold detector (TD) for method of DCSVM + to with respect to the competency function, we reduce the processing time as well as increase data processing speed. Finally proposed algorithm with weighting technique of function to SVM + is optimized in terms of efficiency.

Download Full-text

Efficient web data classification techniques using semi-supervise learning algorithm

Detection of lung cancer stages using image processing and data classification techniques

The smallest extraction problem

Performance evaluation of different classification techniques using different datasets

Big Data classification: techniques and tools

MINING WEB USAGE GRAPHS USING EXAMPLE SEARCH SPACE

Performance Evaluation System for Decision Tree Algorithms

Rough set based ensemble learning algorithm for agricultural data classification

A Machine Learning Algorithm TsF K-NN Based on Automated Data Classification for Securing Mobile Cloud Computing Model

Sensor data classification using machine learning algorithm

One Method to Reduce Data Classification Using Weighting Technique in SVM +

Export Citation Format