PaLM: Pipelined Architecture to Label Legacy Multispectral Data using Unsupervised Learning Algorithm

Author(s):  
Anitha Modi ◽  
Priyanka Sharma ◽  
Kavita Tewani
2021 ◽  
Vol 14 (11) ◽  
pp. 2445-2458
Author(s):  
Valerio Cetorelli ◽  
Paolo Atzeni ◽  
Valter Crescenzi ◽  
Franco Milicchio

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.


2001 ◽  
Vol 27 (3) ◽  
pp. 351-372 ◽  
Author(s):  
Anand Venkataraman

A statistical model for segmentation and word discovery in continuous speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described. Results are also presented of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks.


2018 ◽  
Vol 299 ◽  
pp. 45-54 ◽  
Author(s):  
Hadeel K. Aljobouri ◽  
Hussain A. Jaber ◽  
Orhan M. Koçak ◽  
Oktay Algin ◽  
Ilyas Çankaya

2020 ◽  
Vol 19 (01) ◽  
pp. 283-316 ◽  
Author(s):  
Luis Morales ◽  
José Aguilar ◽  
Danilo Chávez ◽  
Claudia Isaza

This paper proposes a new approach to improve the performance of Learning Algorithm for Multivariable Data Analysis (LAMDA). This algorithm can be used for supervised and unsupervised learning, based on the calculation of the Global Adequacy Degree (GAD) of one individual to a class, through the contributions of all its descriptors. LAMDA has the capability of creating new classes after the training stage. If an individual does not have enough similarity to the preexisting classes, it is evaluated with respect to a threshold called the Non-Informative Class (NIC), this being the novelty of the algorithm. However, LAMDA has problems making good classifications, either because the NIC is constant for all classes, or because the GAD calculation is unreliable. In this work, its efficiency is improved by two strategies, the first one, by the calculation of adaptable NICs for each class, which prevents that correctly classified individuals create new classes; and the second one, by computing the Higher Adequacy Degree (HAD), which grants more robustness to the algorithm. LAMDA-HAD is validated by applying it in different benchmarks and comparing it with LAMDA and other classifiers, through a statistical analysis to determinate the cases in which our algorithm presents a better performance.


Sign in / Sign up

Export Citation Format

Share Document