scholarly journals [Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Artificial Intelligence, Knowledge Discovery and Data Mining Thirty Years of Experience in Cheminformatics

2017 ◽  
Vol 18 (0) ◽  
pp. 3-14
Author(s):  
Takashi Okada
2021 ◽  
Vol 22 (2) ◽  
pp. 6-7
Author(s):  
Michael Zeller

Michael Zeller, Ph.D. is the recipient of the 2020 ACM SIGKDD Service Award, which is the highest service award in the field of knowledge discovery and data mining. Conferred annually on one individual or group in recognition of outstanding professional services and contributions to the field of knowledge discovery and data mining, Dr. Zeller was honored for his years of service and many accomplishments as the secretary and treasurer for ACM SIGKDD, the organizing body of the annual KDD conference. Zeller is also head of AI strategy and solutions at Temasek, a global investment company seeking to make a difference always with tomorrow in mind. He sat down with SIGKDD Explorations to discuss how he first got involved in the KDD conference in 1999, what he learned from the first-ever virtual conference, his work at Temasek, and what excites him about the future of machine learning, data science and artificial intelligence.


2021 ◽  
Vol 23 (2) ◽  
pp. 1-2
Author(s):  
Shipeng Yu

Shipeng Yu, Ph.D. is the recipient of the 2021 ACM SIGKDD Service Award, which is the highest service award in the field of knowledge discovery and data mining. Conferred annually on one individual or group in recognition of outstanding professional services and contributions to the field of knowledge discovery and data mining, Dr. Yu was honored for his years of service and many accomplishments as general chair of KDD 2017 and currently as sponsorship director for SIGKDD. Dr. Yu is Director of AI Engineering, Head of the Growth AI team at LinkedIn, the world's largest professional network. He sat down with SIGKDD Explorations to discuss how he first got involved in the KDD conference in 2006, the benefits and drawbacks of virtual conferences, his work at LinkedIn, and KDD's place in the field of machine learning, data science and artificial intelligence.


2021 ◽  
Vol 4 ◽  
Author(s):  
Shailesh Tripathi ◽  
David Muhr ◽  
Manuel Brunner ◽  
Herbert Jodlbauer ◽  
Matthias Dehmer ◽  
...  

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.


2020 ◽  
Vol 24 (106) ◽  
pp. 79-87
Author(s):  
Fredy Humberto Troncoso Espinosa ◽  
Javiera Valentina Ruiz Tapia

La fuga de clientes es un problema relevante al que enfrentan las empresas de servicios y que les puede generar pérdidas económicas significativas. Identificar los elementos que llevan a un cliente a dejar de consumir un servicio es una tarea compleja, sin embargo, mediante su comportamiento es posible estimar una probabilidad de fuga asociada a cada uno de ellos. Esta investigación aplica minería de datos para la predicción de la fuga de clientes en una empresa de distribución de gas natural, mediante dos técnicas de machine learning: redes neuronales y support vector machine. Los resultados muestran que mediante la aplicación de estas técnicas es posible identificar los clientes con mayor probabilidad de fuga para tomar sobre estas acciones de retenciónoportunas y focalizadas, minimizando los costos asociados al error en la identificación de estos clientes. Palabras Clave: fuga de clientes, minería de datos, machine learning, distribución de gas natural. Referencias [1]J. Miranda, P. Rey y R. Weber, «Predicción de Fugas de Clientes para una Institución Financiera Mediante Support Vector Machines,» Revista Ingeniería de Sistemas Volumen XIX, pp. 49-68, 2005. [2]P. A. Pérez V., «Modelo de predicción de fuga de clientes de telefonía movil post pago,» Universidad de Chile, Santiago, Chile, 2014. [3]Gas Sur S.A., «https://www.gassur.cl/Quienes-Somos/,» [En línea]. [4]J. Xiao, X. Jiang, C. He y G. Teng, «Churn prediction in customer relationship management via GMDH-based multiple classifiers ensemble,» IEEE IntelligentSystems, vol. 31, nº 2, pp. 37-44, 2016. [5]A. M. Almana, M. S. Aksoy y R. Alzahrani, «A survey on data mining techniques in customer churn analysis for telecom industry,» International Journal of Engineering Research and Applications, vol. 4, nº 5, pp. 165-171, 2014. [6]A. Jelvez, M. Moreno, V. Ovalle, C. Torres y F. Troncoso, «Modelo predictivo de fuga de clientes utilizando mineríaa de datos para una empresa de telecomunicaciones en chile,» Universidad, Ciencia y Tecnología, vol. 18, nº 72, pp. 100-109, 2014. [7]D. Anil Kumar y V. Ravi, «Predicting credit card customer churn in banks using data mining,» International Journal of Data Analysis Techniques and Strategies, vol. 1, nº 1, pp. 4-28, 2008. [8]E. Aydoğan, C. Gencer y S. Akbulut, «Churn analysis and customer segmentation of a cosmetics brand using data mining techniques,» Journal of Engineeringand Natural Sciences, vol. 26, nº 1, 2008. [9]G. Dror, D. Pelleg, O. Rokhlenko y I. Szpektor, «Churn prediction in new users of Yahoo! answers,» de Proceedings of the 21st International Conference onWorld Wide Web, 2012. [10]T. Vafeiadis, K. Diamantaras, G. Sarigiannidis y K. Chatzisavvas, «A comparison of machine learning techniques for customer churn prediction,» SimulationModelling Practice and Theory, vol. 55, pp. 1-9, 2015. [11]Y. Xie, X. Li, E. Ngai y W. Ying, «Customer churn prediction using improved balanced random forests,» Expert Systems with Applications, vol. 36, nº 3, pp.5445-5449, 2009. [12]U. Fayyad, G. Piatetsky-Shapiro y P. Smyth, «Knowledge Discovery and Data Mining: Towards a Unifying Framework,» de KDD-96 Proceedings, 1996. [13]R. Brachman y T. Anand, «The process of knowledge discovery in databases,» de Advances in knowledge discovery and data mining, 1996. [14]K. Lakshminarayan, S. Harp, R. Goldman y T. Samad, «Imputation of Missing Data Using Machine Learning Techniques,» de KDD, 1996. [15]B. Nguyen , J. L. Rivero y C. Morell, «Aprendizaje supervisado de funciones de distancia: estado del arte,» Revista Cubana de Ciencias Informáticas, vol. 9, nº 2, pp. 14-28, 2015. [16]I. Monedero, F. Biscarri, J. Guerrero, M. Peña, M. Roldán y C. León, «Detection of water meter under-registration using statistical algorithms,» Journal of Water Resources Planning and Management, vol. 142, nº 1, p. 04015036, 2016. [17]I. Guyon y A. Elisseeff, «An introduction to variable and feature selection,» Journal of machine learning research, vol. 3, nº Mar, pp. 1157-1182, 2003. [18]K. Polat y S. Güneş, «A new feature selection method on classification of medical datasets: Kernel F-score feature selection,» Expert Systems with Applications, vol. 36, nº 7, pp. 10367-10373, 2009. [19]D. J. Matich, «Redes Neuronales. Conceptos Básicos y Aplicaciones,» de Cátedra: Informática Aplicada ala Ingeniería de Procesos- Orientación I, 2001. [20]E. Acevedo M., A. Serna A. y E. Serna M., «Principios y Características de las Redes Neuronales Artificiales, » de Desarrollo e Innovación en Ingeniería, Medellín, Editorial Instituto Antioqueño de Investigación, 2017, pp. Capítulo 10, 173-182. [21]M. Hofmann y R. Klinkenberg, RapidMiner: Data mining use cases and business analytics applications, CRC Press, 2016. [22]R. Pupale, «Towards Data Science,» 2018. [En línea]. Disponible: https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989. [23]F. H. Troncoso Espinosa, «Prediction of recidivismin thefts and burglaries using machine learning,» Indian Journal of Science and Technology, vol. 13, nº 6, pp. 696-711, 2020. [24]L. Tashman, «Out-of-sample tests of forecasting accuracy: an analysis and review,» International journal of forecasting, vol. 16, nº 4, pp. 437-450, 2000. [25]S. Varma y R. Simon, «Bias in error estimation when using cross-validation for model selection,» BMC bioinformatics, vol. 7, nº 1, p. 91, 2006. [26]N. V. Chawla, K. W. Bowyer, L. O. Hall y W. Kegelmeyer, «SMOTE: Synthetic Minority Over-sampling Technique,» Journal of Artificial Inteligence Research16, pp. 321-357, 2002. [27]M. Sokolova y G. Lapalme, «A systematic analysis of performance measures for classification tasks,» Information processing & management, vol. 45, nº 4, pp. 427-437, 2009. [28]S. Narkhede, «Understanding AUC-ROC Curve,» Towards Data Science, vol. 26, 2018. [29]R. Westermann y W. Hager, «Error Probabilities in Educational and Psychological Research,» Journal of Educational Statistics, Vol 11, No 2, pp. 117-146, 1986.  


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 695
Author(s):  
Małgorzata Pac ◽  
Irina Mikutskaya ◽  
Jan Mulawka

Artificial intelligence is one of the fastest-developing areas of science that covers a remarkably wide range of problems to be solved. It has found practical application in many areas of human activity, also in medicine. One of the directions of cooperation between computer science and medicine is to assist in diagnosing and proposing treatment methods with the use of IT tools. This study is the result of collaboration with the Children’s Memorial Health Institute in Warsaw, from where a database containing information about patients suffering from Bruton’s disease was made available. This is a rare disorder, difficult to detect in the first months of life. It is estimated that one in 70,000 to 90,000 children will develop Bruton’s disease. But even these few cases need detailed attention from doctors. Based on the data contained in the database, data mining was performed. During this process, knowledge was discovered that was presented in a way that is understandable to the user, in the form of decision trees. The best models obtained were used for the implementation of expert systems. Based on the data introduced by the user, the system conducts expertise and determines the severity of the course of the disease or the severity of the mutation. The CLIPS language was used for developing the expert system. Then, using this language, software was developed producing six expert systems. In the next step, experimental verification was performed, which confirmed the correctness of the developed systems.


2020 ◽  
Author(s):  
Xia Jing

BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in biomedical and health informatics, and the year 2020 marks the 30th anniversary of UMLS. Despite its longevity, there is no systematic review on UMLS, in general. Thus, this systematic review was conducted to provide an overview of UMLS and its usage in English-language publications in the last 30 years. OBJECTIVE Objectives: The objective is twofold: to provide a comprehensive and systematic picture of the themes, their subtopics, and the publications under each category and to document systematic evidence of UMLS and how it has been used in English-language publications in the last 30 years. METHODS Methods: PubMed, ACM Digital Library, and Nursing & Allied Health Database were used to search for literature. The primary literature search strategy was as follows: UMLS was used as a MeSH term or a keyword or appeared in the title or abstract. Only English-language publications were considered. RESULTS Results: A total of 943 publications were included in the final analysis. After analysis and categorization of publications, UMLS was found to be used in the following emerging themes: natural language processing (NLP) (230 publications), information retrieval (125 publications), terminology study (90 publications), ontology and modeling (80 publications), medical subdomains (76 publications), other language studies (53 publications), artificial intelligence tools and applications (46 publications), patient care (35 publications), data mining and knowledge discovery (25 publications), medical education (20 publications), degree-related theses (13 publications), and digital library (5 publications) as well as UMLS itself (150 publications). CONCLUSIONS Conclusions: UMLS has been used and published successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, building artificial intelligence tools, data mining and knowledge discovery and more foundational work in methodology and middle layers that may lead to advanced products. NLP, UMLS itself, and information retrieval are the three themes with the most publications. The review provides systematic evidence of UMLS in English-language peer-reviewed publications in the last 30 years.


2018 ◽  
Vol 48 (5) ◽  
pp. 673-684 ◽  
Author(s):  
Matthew L. Jones

In the last two decades, a highly instrumentalist form of statistical and machine learning has achieved an extraordinary success as the computational heart of the phenomenon glossed as “predictive analytics,” “data mining,” or “data science.” This instrumentalist culture of prediction emerged from subfields within applied statistics, artificial intelligence, and database management. This essay looks at representative developments within computational statistics and pattern recognition from the 1950s onward, in the United States and beyond, central to the explosion of algorithms, techniques, and epistemic values that ultimately came together in the data sciences of today. This essay is part of a special issue entitled Histories of Data and the Database edited by Soraya de Chadarevian and Theodore M. Porter.


Sign in / Sign up

Export Citation Format

Share Document