Efficient web data classification techniques using semi-supervise learning algorithm

Author(s):  
Anand Singh Rajawat ◽  
Upendra Dwivedi ◽  
Dinesh Ch. Jain ◽  
Akhilesh R. Upadhyay
2021 ◽  
Vol 14 (11) ◽  
pp. 2445-2458
Author(s):  
Valerio Cetorelli ◽  
Paolo Atzeni ◽  
Valter Crescenzi ◽  
Franco Milicchio

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.


Author(s):  
Abdulkadir Özdemir ◽  
Uğur Yavuz ◽  
Fares Abdulhafidh Dael

<span>Nowadays data mining become one of the technologies that paly major effect on business intelligence. However, to be able to use the data mining outcome the user should go through many process such as classified data. Classification of data is processing data and organize them in specific categorize to be use in most effective and efficient use. In data mining one technique is not applicable to be applied to all the datasets. This paper showing the difference result of applying different techniques on the same data. This paper evaluates the performance of different classification techniques using different datasets. In this study four data classification techniques have chosen. They are as follow, BayesNet, NaiveBayes, Multilayer perceptron and J48. The selected data classification techniques performance tested under two parameters, the time taken to build the model of the dataset and the percentage of accuracy to classify the dataset in the correct classification. The experiments are carried out using Weka 3.8 software. The results in the paper demonstrate that the efficiency of Multilayer Perceptron classifier in overall the best accuracy performance to classify the instances, and NaiveBayes classifiers were the worst outcome of accuracy to classifying the instance for each dataset.</span>


Author(s):  
Pijush Kanti Dutta Pramanik ◽  
Saurabh Pal ◽  
Moutan Mukhopadhyay ◽  
Simar Preet Singh

Author(s):  
V. UMA MAHESWARI ◽  
A. SIROMONEY ◽  
K. M. MEHATA

Web mining refers to the process of discovering potentially useful and previously unknown information or knowledge from web data. A graph-based framework is used for classifying Web users based on their navigation patterns. GOLEM is a learning algorithm that uses the example space to restrict the solution search space. In this paper, this algorithm is modified for the graph-based framework. GOLEM is appropriate in this application where the solution search space is very large. An experimental illustration is presented.


2013 ◽  
Vol 11 (8) ◽  
pp. 2879-2886
Author(s):  
Deepali Saini ◽  
Prof. Anand Rajavat

In the machine learning process, classification can be described by supervise learning algorithm. Classification techniques have properties that enable the representation of structures that reflect knowledge of the domain being classified. Industries, education, business and many other domains required knowledge for the growth.  Some of the common classification algorithms used in data mining and decision support systems is: Neural networks, Logistic regression, Decision trees etc. The decision regarding most suitable data mining algorithm cannot be made spontaneously. Selection of appropriate data mining algorithm for Business domain required comparative analysis of different algorithms based on several input parameters such as accuracy, build time and memory usage.To make analysis and comparative study, implementation of popular algorithm required on the basis of literature survey and frequency of algorithm used in present scenario. The performance of algorithms are enhanced and evaluated after applying boosting on the trees. We selected numerical and nominal types of dataset and apply on algorithms. Comparative analysis is perform on the result obtain by the system. Then we apply the new dataset in order to generate generate prediction outcome.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1917-1930 ◽  
Author(s):  
Lei Shi ◽  
Qiguo Duan ◽  
Juanjuan Zhang ◽  
Lei Xi ◽  
Hongbo Qiao ◽  
...  

Agricultural data classification attracts more and more attention in the research area of intelligent agriculture. As a kind of important machine learning methods, ensemble learning uses multiple base classifiers to deal with classification problems. The rough set theory is a powerful mathematical approach to process unclear and uncertain data. In this paper, a rough set based ensemble learning algorithm is proposed to classify the agricultural data effectively and efficiently. An experimental comparison of different algorithms is conducted on four agricultural datasets. The results of experiment indicate that the proposed algorithm improves performance obviously.


2016 ◽  
Vol 10 (9) ◽  
pp. 245
Author(s):  
Arash Ghorban Niya Delavar ◽  
Zahra Jafari

SVM, a learning algorithm to analyze data and recognize patterns is used. But there is an important issue, replicate data as well as its real-time processing has not been correctly calculated. For this reason, in this paper we have provided a method DCSVM+ to reduce data classification using weighting technique in SVM +. The proposed method with regard to the parameters to SVM + has the optimum response time. By observing the parameter of data volume and their density, we abled to classify the size of interval as case that this classification to investigated case study reduces the running time of algorithm SVM +. Also by providing objective function of the proposed method, we abled to reduce replicate data to SVM + by integrating parameters and data classification and finally we provided threshold detector (TD) for method of DCSVM + to with respect to the competency function, we reduce the processing time as well as increase data processing speed. Finally proposed algorithm with weighting technique of function to SVM + is optimized in terms of efficiency.


Sign in / Sign up

Export Citation Format

Share Document