Lavoisier: A DSL for increasing the level of abstraction of data selection and formatting in data mining

2020 ◽  
Vol 60 ◽  
pp. 100987 ◽  
Author(s):  
Alfonso de la Vega ◽  
Diego García-Saiz ◽  
Marta Zorrilla ◽  
Pablo Sánchez
2020 ◽  
Vol 5 (2) ◽  
pp. 130-137
Author(s):  
Teguh Iman Hermanto ◽  
Yusuf Muhyidin
Keyword(s):  

Berdasarkan data yang tercatat pada tahun 2018 terdapat 43 organisasi perangkat daerah di kabupaten Purwakarta yang sudah mendapatkan bandwidth internet. Setiap organisasi perangkat daetah yang telah mendapatkan bandwidth mempunyai tingkat kebutuhan yang berbeda – beda ,namun saat ini jumlah pembagian bandwidth dan tingkat kebutuhan belum dapat dikelompokan. Tujuan dari penelitian ini untuk menetukan tingkat kebutuhan bandwidth di Purwakarta dengan cara melakukan analisis data mining terhadap data yang ada menggunakan algoritma DBSCAN sehingga akan terbentuk cluster yang yang dibagi berdasarkan tingkat kebutuhan. Pada penelitian ini metode analisis yang digunakan yaitu SEMMA (Sample, Explore, Modify, Model, Assess) tahapan SEMMA meliputi  Data Selection, Pre-processing / cleaning, Transformation, Data Mining dan Assess / Evaluation. Hasil dari analisis menggunakan nilai minpts = 5 dan nilai epsilon = 3. Cluster yang terbentuk yaitu sebanyak 2 cluster, cluster 1 terdapat sebanyak 15 organisasi perangkat daerah dengan tingkat kebutuhan bandwidth rendah dan cluster 2 terdapat 21 organisasi perangkat daerah dengan tingkat kebutuhan bandwidth sedang, dan Noise terdapat 7 organisasi perangkat daerah dengan kebutuhan bandwidth yang terlalu tinggi.


Author(s):  
Huan Liu

The amounts of data become increasingly large in recent years as the capacity of digital data storage worldwide has significantly increased. As the size of data grows, the demand for data reduction increases for effective data mining. Instance selection is one of the effective means to data reduction. This article introduces basic concepts of instance selection, its context, necessity and functionality. It briefly reviews the state-of-the-art methods for instance selection. Selection is a necessity in the world surrounding us. It stems from the sheer fact of limited resources. No exception for data mining. Many factors give rise to data selection: data is not purely collected for data mining or for one particular application; there are missing data, redundant data, and errors during collection and storage; and data can be too overwhelming to handle. Instance selection is one effective approach to data selection. It is a process of choosing a subset of data to achieve the original purpose of a data mining application. The ideal outcome of instance selection is a model independent, minimum sample of data that can accomplish tasks with little or no performance deterioration.


2021 ◽  
Vol 15 (5) ◽  
pp. 114-120
Author(s):  
A. M. Lila ◽  
I. Yu. Torshin ◽  
A. N. Gromov ◽  
V. A. Semenov ◽  
O. A. Gromova

The pharmacoinformation approach to the assessment and modeling of drugs involves the use of modern methods of data mining. These methods include: 1) analysis of big data (selection of texts of scientific publications, search for new biomarkers); 2) computer analysis of texts (automatic classification of texts by content, identification of pseudoscientific texts); 3) analysis of metric maps (visualization and analysis of complex patterns, including clustering) and 4) chemoinformation analysis, including the assessment of the effect of drugs on the transcriptome, proteome and microbiome of a person. The article provides examples of the application of these methods of pharmacoinformatics to chondroprotectors containing standardized forms of chondroitin sulfate and glucosamine sulfate.


2019 ◽  
Vol 5 (2) ◽  
pp. 139
Author(s):  
Usman Ependi ◽  
Ade Putra

Dalam memprediksi persediaan barang banyak metode yang dapat dilakukan antara lain yaitu dengan melakukan pengolahan data penjualan menggunakan metode Data Mining yang disertai dengan algoritma apriori didasarkan pada proses pembelian yang dilakukan oleh konsumen berdasarkan keterkaitan antar produk yang dibeli. Dengan menggunakan algoritma apriori pihak perusahaan dalam hal ini adalah Regional Part Depo Auto 2000 Palembang dapat menyediakan spare part yang dibutuhkan oleh konsumen khususnya dilingkungan Sumatera Selatan tanpa harus melakukan proses indent hal ini dikarenakan banyaknya jumlah spare part yang harus di sediakan oleh PT. Depo Toyota guna melayani kebutuhan konsumen di lingkungan Sumatera Selatan. Adapun tahapan data mining yang di gunakan yaitu Knowledge Discovery in Database (KDD) yang terdiri dari proses data cleaning and integration, data selection and integration, data mining, evaluation and prentation. Dari proses diatas didapat pola keterkaitan spare part sebanyak 646 dari jumlah spare part sebanyak 338.


2018 ◽  
Vol 48 (1) ◽  
pp. 30-35
Author(s):  
Simona Ramanauskaitė ◽  
Kiril Griazev

Data mining from web pages becomes more frequently adapted in business areas. However on the one hand while analyzing the current situation, we observe that solutions for mining structured data from web pages exists. On the other hand we see that a scientific dataset for unstructured data that would allow create and test new data selection methods does not exist. This limits the development and research of unstructured web data therefore we propose a method for HTML code block similarity estimation. The method combines both data and structure comparison and allows quantitative similarity presentation of two HTML code blocks.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2010 ◽  
Vol 24 (2) ◽  
pp. 112-119 ◽  
Author(s):  
F. Riganello ◽  
A. Candelieri ◽  
M. Quintieri ◽  
G. Dolce

The purpose of the study was to identify significant changes in heart rate variability (an emerging descriptor of emotional conditions; HRV) concomitant to complex auditory stimuli with emotional value (music). In healthy controls, traumatic brain injured (TBI) patients, and subjects in the vegetative state (VS) the heart beat was continuously recorded while the subjects were passively listening to each of four music samples of different authorship. The heart rate (parametric and nonparametric) frequency spectra were computed and the spectra descriptors were processed by data-mining procedures. Data-mining sorted the nu_lf (normalized parameter unit of the spectrum low frequency range) as the significant descriptor by which the healthy controls, TBI patients, and VS subjects’ HRV responses to music could be clustered in classes matching those defined by the controls and TBI patients’ subjective reports. These findings promote the potential for HRV to reflect complex emotional stimuli and suggest that residual emotional reactions continue to occur in VS. HRV descriptors and data-mining appear applicable in brain function research in the absence of consciousness.


2014 ◽  
Vol 45 (5) ◽  
pp. 408-420 ◽  
Author(s):  
Michela Menegatti ◽  
Monica Rubini

Two studies examined whether individuals vary the level of abstraction of messages composed to achieve the relational goals of initiating, maintaining, and ending a romantic relationships when the goal of communication was self-disclosure or persuading one’s partner. Study 1 showed that abstract language was preferred to disclose thoughts and feelings about initiating a romantic relationship or to persuade the partner to consolidate a long-term one. Study 2 revealed that participants used abstract terms to persuade the partner to continue a problematic relationship and to disclose their thoughts on ending it. These results show that language abstraction is a flexible means to handle individuals’ goals and influence the course of romantic relationships.


Sign in / Sign up

Export Citation Format

Share Document