scholarly journals Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

2021 ◽  
Vol 4 ◽  
Author(s):  
Shailesh Tripathi ◽  
David Muhr ◽  
Manuel Brunner ◽  
Herbert Jodlbauer ◽  
Matthias Dehmer ◽  
...  

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.

2020 ◽  
Vol 10 (1) ◽  
pp. 12
Author(s):  
Ekka Pujo Ariesanto Akhmad

<strong> </strong>Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi <em>86,71</em>%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi <em>90,27</em>% untuk prediksi nasabah bank yang mengambil kartu kredit.


2017 ◽  
Vol 19 (3) ◽  
pp. 388
Author(s):  
Ricardo Timaran-Pereira ◽  
Andrés Calderón-Romero ◽  
Arsenio Hidalgo-Troya

Introducción: La Organización Panamericana de la Salud (OPS) desde el año 1993 y la Organización Mundial de la Salud (OMS) en 1996, aceptaron que la violencia es un problema de salud pública, situación que se corrobora en el Informe de Violencia y Salud, en el cual América Latina presentó una tasa de homicidios de 18 por cada 100.000 personas, y es considerada como una de las regiones más violentas del mundo. Objetivo: Detectar patrones delictivos con técnicas de minería de datos en el Observatorio del Delito del municipio de Pasto (Colombia). Materiales y métodos: Se aplicó Cross Industry Standard Process for Data Mining (CRISP-DM), una de las metodologías utilizadas en el desarrollo de proyectos de minería de datos en los ambientes académico e industrial. La fuente de información fue el Observatorio del Delito del municipio de Pasto, donde está almacenadas las cifras históricas, limpias y transformadas sobre las lesiones de causa externa (fatales y no fatales), registrados en 11 años. Resultados: Se construyó un modelo de clasificación basado en árboles de decisión que permitió descubrir patrones de muertes por causa externa. Para el caso de homicidios, estos sucedieron en su mayoría en la Comuna 5 de Pasto, los fines de semana, en la madrugada, en el segundo semestre del año, en la vía pública y las víctimas fueron hombres adultos, de oficios varios, la causa de los homicidios fueron riñas y se produjeron con arma de fuego. Conclusión: El conocimiento generado ayudará a los organismos gubernamentales y de seguridad a tomar decisiones eficaces en lo relacionado a la implementación de planes de prevención de delitos y seguridad ciudadana.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Wahyu Nurjaya WK ◽  
Yusrina Adani

Bank BRI Syariah memiliki banyak produk yang menarik untuk ditawarkan kepada calon nasabah maupun nasabah tetap berupa produk jangka panjang atau jangka pendek, yang menawarkan banyak keuntungan bagi nasabah itu sendiri. Salah satu produknya adalah Deposito berjangka yang merupakan produk investasi dengan menyimpan uang dan penarikanya hanya bisa dilakukan pada kurun waktu tertentu yang telah di janjikan oleh pihak bank dengan persetujuan nasabah. Dengan telemarketing yang baik oleh pihak bank maka diharapkan calon nasabah dan nasabah tetap mengetahui produk ini.Telemarketing adalah salah satu cara dalam mempromosikan produk-produk atau jasa layanan yang ada di bank. Seorang telemarketing bank harus dapat membuat target nasabah, nasabah mana yang berpotensi untuk meningkatkan deposito dengan melihat data-data nasabah bank yang telah tersimpan dalam database. Dikarenakan database nasabah sangat besar, maka tidak mungkin untuk mencari pola prediksi calon nasabah atau nasabah tetap yang berminat untuk program Deposito dengan cara konvensional.Berdasarkan hal tersebut, pengelolaan data yang sangat besar bisa diatasi dengan memanfaatkan Data Mining yaitu proses iteratif dan interaktif untuk menentukan pola atau model baru yang sempurna, bermanfaat dan dapat dimengerti dalam suatu database yang sangat besar. Data Mining berisi pencarian trend pola yang diinginkan dalam database besar untuk membantu pengambilan keputusan diwaktu yang akan datang. Dengan menggunakan Data Mining diharapkan dapat mengoptimasikan proses prediksi data nasabah oleh seorang telemarketing, sehingga dia mampu menawarkan deposito dengan target calon nasabah atau nasabah tetap yang tepat sasaran. Adapun Teknik Klasifikasi Data Mining menggunakan algoritma Naïve Bayes. Naïve Bayes bekerja sangat efektif saat diuji pada dataset yang besar untuk menentukan pola dimasa lalu dan mencari fungsi yang akan menjadi pola penilaian data dimasa yang akan datang. Untuk mencapai hasil yang diharapkan metode CRISP-DM (Cross Industry Standard Process for Data Mining) sangat cocok sebagai solusi, melalui proses business understanding, data understanding, data preparation, modeling, evaluation dan deployment. Dengan ini hasil prediksi akan lebih akurat, sehingga untuk target telemarketing produk Deposito Bank BRI Syariah akan tepat sasaran.


Author(s):  
M. A. Burhanuddin ◽  
Ronizam Ismail ◽  
Nurul Izzaimah ◽  
Ali Abdul-Jabbar Mohammed ◽  
Norzaimah Zainol

Recently, the mobile service providers have been growing rapidly in Malaysia. In this paper, we propose analytical method to find best telecommunication provider by visualizing their performance among telecommunication service providers in Malaysia, i.e. TM Berhad, Celcom, Maxis, U-Mobile, etc. This paperuses data mining technique to evaluate the performanceof telecommunication service providers using their customers feedback from Twitter Inc. It demonstrates on how the system could process and then interpret the big data into a simple graph or visualization format. In addition, build a computerized tool and recommend data analytic model based on the collected result. From prepping the data for pre-processing until conducting analysis, this project is focusing on the process of data science itself where Cross Industry Standard Process for Data Mining (CRISP-DM) methodology will be used as a reference. The analysis was developed by using R language and R Studio packages. From the result, it shows that Telco 4 is the best as it received highest positive scores from the tweet data. In contrast, Telco 3 should improve their performance as having less positive feedback from their customers via tweet data. This project bring insights of how the telecommunication industries can analyze tweet data from their customers. Malaysia telecommunication industry will get the benefit by improving their customer satisfaction and business growth. Besides, it will give the awareness to the telecommunication user of updated review from other users.


2018 ◽  
Vol 3 (3) ◽  
Author(s):  
Itallo Henrique de Santana Santos ◽  
Alexandre Magno Andrade Maciel

A Secretaria da Controladoria Geral do Estado (SCGE) analisa mensalmente despesas geradas pelos diferentes órgãos do Estado de Pernambuco com o objetivo de garantir os pagamentos daquelas que são mais sensíveis. Mais de 1 bilhão de reais em despesas ficaram pendentes no exercício de 2016, demonstrando importância de priorização dos pagamentos. Nesse contexto o artigo apresenta o processo de desenvolvimento de um Sistema de Apoio a Decisão (SAD) para a Secretaria da Controladoria Geral do Estado de Pernambuco. O sistema proposto tem a capacidade de classificar as despesas públicas por meio de árvore de decisão auxiliando o trabalho de análise dos gestores responsáveis na priorização de pagamentos. No artigo é detalhado a caracterização do problema, a fundamentação teórica usada no trabalho, a aplicação da metodologia CRISP-DM (Cross Industry Standard Process for Data Mining) e o sistema. A utilização de árvore de decisão na classificação das despesas teve como resultado uma acurácia de 99%, mostrando que o uso desse tipo de modelo atendeu satisfatoriamente na solução do problema encontrado.


Author(s):  
Mihaela van der Schaar ◽  
Harry Hemingway

Machine learning offers an alternative to the methods for prognosis research in large and complex datasets and for delivering dynamic models of prognosis. Machine learning foregrounds the capacity to learn from large and complex data about the pathways, predictors, and trajectories of health outcomes in individuals. This reflects wider societal drives for data-driven modelling embedded and automated within powerful computers to analyse large amounts of data. Machine learning derives algorithms that can learn from data and can allow the data full freedom, for example, to follow a pragmatic approach in developing a prognostic model. Rather than choosing factors for model development in advance, machine learning allows the data to reveal which features are important for which predictions. This chapter introduces key machine learning concepts relevant to each of the four prognosis research types, explains where it may enhance prognosis research, and highlights challenges.


2020 ◽  
Vol 10 (22) ◽  
pp. 8281
Author(s):  
Luís B. Elvas ◽  
Carolina F. Marreiros ◽  
João M. Dinis ◽  
Maria C. Pereira ◽  
Ana L. Martins ◽  
...  

Buildings in Lisbon are often the victim of several types of events (such as accidents, fires, collapses, etc.). This study aims to apply a data-driven approach towards knowledge extraction from past incident data, nowadays available in the context of a Smart City. We apply a Cross Industry Standard Process for Data Mining (CRISP-DM) approach to perform incident management of the city of Lisbon. From this data-driven process, a descriptive and predictive analysis of an events dataset provided by the Lisbon Municipality was possible, together with other data obtained from the public domain, such as the temperature and humidity on the day of the events. The dataset provided contains events from 2011 to 2018 for the municipality of Lisbon. This data mining approach over past data identified patterns that provide useful knowledge for city incident managers. Additionally, the forecasts can be used for better city planning, and data correlations of variables can provide information about the most important variables towards those incidents. This approach is fundamental in the context of smart cities, where sensors and data can be used to improve citizens’ quality of life. Smart Cities allow the collecting of data from different systems, and for the case of disruptive events, these data allow us to understand them and their cascading effects better.


Author(s):  
ZHENGXIN CHEN

Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rather than focusing on the data alone), complex data injects semantics into the mining process, thus enhancing the potential of making better contribution to knowledge economy. Since the relationships between the data reveal certain behavioral aspects underlying the plain data, this shift of mining from simple data to complex data signals a fundamental change to a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining. Behavior mining also has the potential of unifying some other recent activities in data mining. We discuss important aspects on behavior mining, and discuss its implications for the future of data mining.


Sign in / Sign up

Export Citation Format

Share Document