knowledge discovery in databases
Recently Published Documents


TOTAL DOCUMENTS

379
(FIVE YEARS 77)

H-INDEX

22
(FIVE YEARS 2)

Author(s):  
Claudimar Pereira Da Veiga ◽  
Tamires Almeida Sfeir ◽  
Maria Teresinha Arns Steiner ◽  
Cassius Tadeu Scarpin ◽  
Kellen Endler

Author(s):  
Suriya Jambunathan ◽  
Suguna Ramadass ◽  
Palanivel kumaran

In the ubiquitously connected world of IT infrastructure, Intrusion Detection System (IDS) plays vital role. IDS is considered as a critical component of security infrastructure and is implemented either through hardware or software devices and can detect malicious activities in a networked environment. To detect or prevent network attacks, Network Intrusion Detection (NID) system may be equipped with machine learning algorithms to achieve better accuracy and faster detection speed. Analyzing different attacks effectively through Dimensionality Reduction Algorithms is an efficient mechanism. The significance of these algorithms is they improvise feature selection from huge datasets. Also through this the learning speed is enhanced. Speed is a crucial parameter in the success of network intrusion detection systems for defending reactions. In this paper open source datasets Knowledge Discovery in Databases (KDD CUP) dataset and 10% KDD CUP dataset are employed for experimentation. These datasets are provided to Dimensionality Reduction Algorithms like Principal Component Analysis (PCA), Linear Discriminate Analysis (LDA) and Kernel PCA with different kernels and classified with Logistic Regression classification algorithm for procuring accurate results. Further to boost up the accuracy achieved so far K-fold algorithm is utilized. Finally a comparative study of different accuracy results is done by using K-fold algorithm and also without the usage of this algorithm. The empirical study on KDD CUP data confirms the effectiveness of the proposed scheme. In this paper we discovered the combination of multiple dimensionality reduction algorithm such as PCA , LDA and Kernel PCA with classification algorithm and this combination of algorithm gives best result. Our study will help out the researchers to uncover critical area such as intrusion detection in network traffic environment. The results what we identified will be very much helpful for researchers for their future research on KDD CUP dataset. In this the new theory will be arrived by this research that the best accuracy achieved by PCA with 10% KDD CUP dataset experimental results without KFold attained 98% and with KFold attained 99%. LDA with 10% KDD CUP Dataset experimental results without KFold attained 98% and with KFold attained 99%.


2021 ◽  
Author(s):  
Flavio de Assis Vilela ◽  
Ricardo Rodrigues Ciferri

ETL (Extract, Transform, and Load) is an essential process required to perform data extraction in knowledge discovery in databases and in data warehousing environments. The ETL process aims to gather data that is available from operational sources, process and store them into an integrated data repository. Also, the ETL process can be performed in a real-time data warehousing environment and store data into a data warehouse. This paper presents a new and innovative method named Data Extraction Magnet (DEM) to perform the extraction phase of ETL process in a real-time data warehousing environment based on non-intrusive, tag and parallelism concepts. DEM has been validated on a dairy farming domain using synthetic data. The results showed a great performance gain in comparison to the traditional trigger technique and the attendance of real-time requirements.


CEDAMAZ ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 124-132
Author(s):  
Yulissa Torres-Quezada

Actualmente, la ocurrencia de siniestros de tránsito representa un problema de salud pública a nivel nacional y regional, ocasionando pérdidas humanas, además de que cada día va en aumento a nivel mundial, es por ello que resulta fundamental e importante plantear un estudio que permita determinar cuáles son los factores que ocasionan la ocurrencia de los siniestros de tránsito. En este trabajo de investigación se aplica minería de datos para determinar los factores más influyentes en la ocurrencia de siniestros de tránsito en Ecuador en el año 2020, esto se llevó a cabo empleando cinco fases de la metodología Knowledge  Discovery  in  Databases (KDD) constituida por: búsqueda de información, obtención de datos, depuración de la base de datos, aplicación de técnicas de minería de datos e interpretación y presentación de resultados, estas, utilizadas para el descubrimiento de patrones ocultos en el conjunto de datos, el cual fue recolectado por la Agencia Nacional de Tránsito (ANT) y tiene un total de 418 variables y 16972 registros de eventos registrados sobre siniestros de tránsito en Ecuador. Se aplicaron siete técnicas de minería de datos, tales como: CHAID, CHAID Exhaustivo, CRT, Perceptrón Multicapa, Función de Base Radial, Naive Bayes y BayesNet. El algoritmo CHAID Exhaustivo fue el que obtuvo los mejores resultados con el cual se identificó los patrones más importantes en los datos y se evaluó las posibles asociaciones entre las variables recogidas. Finalmente, se determinó que el factor humano es el factor más influyente con una probabilidad de ocurrencia del 69,64%.


Author(s):  
Fredy Humberto Troncoso-Espinosa ◽  
Karen Castro-Albornoz

Un revestimiento moldeado para puertas es un tablero de madera de alta densidad que es utilizado como el principal componente en la fabricación de puertas.  Para asegurar su comercialización, se debe cumplir con exigentes normas de calidad, siendo la principal norma aquella que mide la fuerza necesaria para desprender el revestimiento de la estructura de una puerta. Los ensayos de calidad son realizados cada dos horas y sus resultados son obtenidos luego de aproximadamente cinco horas. Si los resultados muestran que los revestimientos están fuera del estándar de calidad exigido, se generan pérdidas económicas debido a este tiempo de espera. Esta investigación propone el uso de minería de datos mediante técnicas de machine learning para predecir en forma continua esta medida de calidad y reducir las pérdidas económicas asociadas a la espera de los resultados. Para la aplicación de minería de datos, se creó una base de datos en base al registro histórico de las variables del proceso productivo y de los ensayos de calidad. La metodología empleada es el descubrimiento de conocimiento en bases de datos KDD (Knowledge Discovery in Databases). La aplicación de esta metodología permitió identificar las principales variables que afectan la calidad de los revestimientos y entrenar cuatro algoritmos de machine learning para predecir su calidad. Los resultados muestran que el algoritmo que mejor predice la calidad es Neural Net y permiten demostrar que la implementación del algoritmo Neural Net reducirá las pérdidas económicas asociadas a la espera de los resultados de los ensayos de calidad.


2021 ◽  
Vol 14 (Supl. 2) ◽  
pp. 1-20
Author(s):  
Luani Rosa de Oliveira Piva ◽  
Carlos Roberto Sanquetta ◽  
Jaime Wojciechowski ◽  
Ana Paula Dalla Corte

A floresta amazônica brasileira concentra um dos maiores estoques de biomassa viva e carbono no planeta. Dada sua importância como sumidouro de carbono, estudos de quantificação de biomassa nesta floresta tropical são imprescindíveis para se buscar um maior entendimento nas questões ligadas às mudanças climáticas. Entretanto, há de se destacar que dados relativos a inventários de biomassa e carbono no Brasil, atualmente, são escassos ou encontram-se de maneira dispersa em diversas bases de dados, muitas vezes com acesso restrito. Nesse sentido, iniciativas de dados abertos (open data) para inventários florestais na Amazônia brasileira são necessárias. No presente estudo, foi feita uma abordagem inédita com dados de inventário florestal do Projeto RADAMBRASIL - disponibilizados na plataforma BDiA - utilizando o método bootstrap de reamostragem, para estimativas de biomassa e carbono no bioma Amazônia. O Processo KDD (Knowledge Discovery in Databases) foi utilizado para extrair informações da base de dados brutos. Três equações para Floresta Ombrófila Densa e duas para Floresta Ombrófila Aberta foram utilizadas, para árvores de grande porte (DAP ≥ 30 cm). Os resultados indicaram que estimativas de biomassa a partir da equação de Chambers et al. (2001) apresentaram menores valores de erro padrão e viés, para dados de Floresta Ombrófila Densa. Para dados de Floresta Ombrófila Aberta, melhores resultados foram encontrados nas estimativas a partir da equação de Nogueira et al. (2008) [2]. Pode-se concluir que essas duas equações são as mais indicadas nas estimativas de biomassa e carbono para os dados analisados. Além disso, a análise, limitada a árvores de grande porte, indicou que estas são representativas na composição total dos estoques de biomassa e carbono no bioma Amazônia (~163 Mg.ha-1, para Floresta Densa, comparado a valores da literatura, de 250-350 Mg.ha-1).


2021 ◽  
Author(s):  
Ana C. Lorena ◽  
Filipe A. N. Verri ◽  
Tiago A. Almeida

Este artigo editorial descreve a Competição Brasileira de Descoberta de Conhecimento em Bancos de Dados (KDD-BR 2021) e resume as contribuições das três melhores soluções obtidas em sua quinta edição. A competição de 2021 envolveu a resolução de instâncias do Problema do Caixeiro Viajante, de diferentes tamanhos, usando uma abordagem de previsão de arestas.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sumeer Gul ◽  
Shohar Bano ◽  
Taseen Shah

Purpose Data mining along with its varied technologies like numerical mining, textual mining, multimedia mining, web mining, sentiment analysis and big data mining proves itself as an emerging field and manifests itself in the form of different techniques such as information mining; big data mining; big data mining and Internet of Things (IoT); and educational data mining. This paper aims to discuss how these technologies and techniques are used to derive information and, eventually, knowledge from data. Design/methodology/approach An extensive review of literature on data mining and its allied techniques was carried to ascertain the emerging procedures and techniques in the domain of data mining. Clarivate Analytic’s Web of Science and Sciverse Scopus were explored to discover the extent of literature published on Data Mining and its varied facets. Literature was searched against various keywords such as data mining; information mining; big data; big data and IoT; and educational data mining. Further, the works citing the literature on data mining were also explored to visualize a broad gamut of emerging techniques about this growing field. Findings The study validates that knowledge discovery in databases has rendered data mining as an emerging field; the data present in these databases paves the way for data mining techniques and analytics. This paper provides a unique view about the usage of data, and logical patterns derived from it, how new procedures, algorithms and mining techniques are being continuously upgraded for their multipurpose use for the betterment of human life and experiences. Practical implications The paper highlights different aspects of data mining, its different technological approaches, and how these emerging data technologies are used to derive logical insights from data and make data more meaningful. Originality/value The paper tries to highlight the current trends and facets of data mining.


Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935 2930 ABSTRACT Academics and industry researchers alike find privacy-preservation of large data to be a very intriguing field of study. Data collection, storage, and processing are the three steps of big data's life cycle. At different stages of the big data life cycle, different privacy and security solutions are used. Many health-care stakeholders are working together to develop a new pattern for safeguarding people from an unknown disease while also promoting economic prosperity. The methods of big data processing and big data analytics will be employed to discover new economic growth patterns. Because the current method of data anonymization leads to data breaches, researchers needed to develop a new way of large data mining or knowledge discovery in databases (KDD), in which numerous parties share their data to identify new patterns. This study introduces a novel way for data mining privacy protection based on Blockchain and the InterPlanetary File System (IPFS) (PPDM). The authors propose leveraging Blockchain and IPFS to create the ChainPPDM approach for preserving big data privacy. The data saved on the blockchain is immutable, transparent, and safe, and it allows for decentralized storage. IPFS is a distributed file system that stores data in a decentralized manner.


2021 ◽  
Vol 39 (11) ◽  
pp. 1331-1340
Author(s):  
Janaína Lopes Dias ◽  
Michele Kremer Sott ◽  
Caroline Cipolatto Ferrão ◽  
João Carlos Furtado ◽  
Jorge André Ribas Moraes

The processes related to solid waste management (SWM) are being revised as new technologies emerge and are applied in the area to achieve greater environmental, social and economic sustainability for society. To achieve our goal, two robust review protocols (Population, Intervention, Comparison, Outcome, and Context (PICOC) and Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA)) were used to systematically analyze 62 documents extracted from the Web of Science database to identify the main techniques and tools for Knowledge Discovery in Databases (KDD) and Data Mining (DM) as applied to SWM and explore the technological potential to optimize the stages of collecting and transporting waste. Moreover, it was possible to analyze the main challenges and opportunities of KDD and DM for SWM. The results show that the most used tools for SWM are MATLAB (29.7%) and GIS (13.5%), whereas the most used techniques are Artificial Neural Networks (35.8%), Linear Regression (16.0%) and Support Vector Machine (12.3%). In addition, 15.3% of the studies were conducted with data from China, 11.1% from India and 9.7% of the studies analyzed and compared data from several other countries. Furthermore, the research showed that the main challenges in the field of study are related to the collection and treatment of data, whereas the opportunities appear to be linked mainly to the impact on the pillars of sustainable development. Thus, this study portrays important issues associated with the use of KDD and DM for optimal SWM and has the potential to assist and direct researchers and field professionals in future studies.


Sign in / Sign up

Export Citation Format

Share Document