Tools, Technologies, and Methodologies to Support Data Science

Author(s):  
Ricardo A. Barrera-Cámara ◽  
Ana Canepa-Saenz ◽  
Jorge A. Ruiz-Vanoye ◽  
Alejandro Fuentes-Penna ◽  
Miguel Ángel Ruiz-Jaimes ◽  
...  

Various devices such as smart phones, computers, tablets, biomedical equipment, sports equipment, and information systems generate a large amount of data and useful information in transactional information systems. However, these generate information that may not be perceptible or analyzed adequately for decision-making. There are technology, tools, algorithms, models that support analysis, visualization, learning, and prediction. Data science involves techniques, methods to abstract knowledge generated through diverse sources. It combines fields such as statistics, machine learning, data mining, visualization, and predictive analysis. This chapter aims to be a guide regarding applicable statistical and computational tools in data science.

Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.


2021 ◽  
Vol 22 (2) ◽  
pp. 6-7
Author(s):  
Michael Zeller

Michael Zeller, Ph.D. is the recipient of the 2020 ACM SIGKDD Service Award, which is the highest service award in the field of knowledge discovery and data mining. Conferred annually on one individual or group in recognition of outstanding professional services and contributions to the field of knowledge discovery and data mining, Dr. Zeller was honored for his years of service and many accomplishments as the secretary and treasurer for ACM SIGKDD, the organizing body of the annual KDD conference. Zeller is also head of AI strategy and solutions at Temasek, a global investment company seeking to make a difference always with tomorrow in mind. He sat down with SIGKDD Explorations to discuss how he first got involved in the KDD conference in 1999, what he learned from the first-ever virtual conference, his work at Temasek, and what excites him about the future of machine learning, data science and artificial intelligence.


2021 ◽  
Vol 23 (2) ◽  
pp. 1-2
Author(s):  
Shipeng Yu

Shipeng Yu, Ph.D. is the recipient of the 2021 ACM SIGKDD Service Award, which is the highest service award in the field of knowledge discovery and data mining. Conferred annually on one individual or group in recognition of outstanding professional services and contributions to the field of knowledge discovery and data mining, Dr. Yu was honored for his years of service and many accomplishments as general chair of KDD 2017 and currently as sponsorship director for SIGKDD. Dr. Yu is Director of AI Engineering, Head of the Growth AI team at LinkedIn, the world's largest professional network. He sat down with SIGKDD Explorations to discuss how he first got involved in the KDD conference in 2006, the benefits and drawbacks of virtual conferences, his work at LinkedIn, and KDD's place in the field of machine learning, data science and artificial intelligence.


Web Services ◽  
2019 ◽  
pp. 105-126
Author(s):  
N. Nawin Sona

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.


Author(s):  
Sabitha Rajagopal

Data Science employs techniques and theories to create data products. Data product is merely a data application that acquires its value from the data itself, and creates more data as a result; it's not just an application with data. Data science involves the methodical study of digital data employing techniques of observation, development, analysis, testing and validation. It tackles the real time challenges by adopting a holistic approach. It ‘creates' knowledge about large and dynamic bases, ‘develops' methods to manage data and ‘optimizes' processes to improve its performance. The goal includes vital investigation and innovation in conjunction with functional exploration intended to notify decision-making for individuals, businesses, and governments. This paper discusses the emergence of Data Science and its subsequent developments in the fields of Data Mining and Data Warehousing. The research focuses on need, challenges, impact, ethics and progress of Data Science. Finally the insights of the subsequent phases in research and development of Data Science is provided.


2020 ◽  
Vol 19 (1) ◽  
pp. 43-65
Author(s):  
Jane Mitchell ◽  
Simon Mitchell ◽  
Cliff Mitchell

Abstract Advances in mathematical and computational technologies have brought unique and ground-breaking benefits to diverse fields throughout society (engineering, medicine, economics, etc.). Within legal systems, however, the potential applications of data science and innovative mathematical tools have yet to be embraced with the same ambition. The complex decision-making that is needed for reaching just verdicts is often seen as out of reach for such approaches and, in the case of criminal trials, this inhibits exploration into whether machine learning could have a positive impact. Here, through assigning numerical scores to prosecution and defence evidence, and employing an approach based on dimensionality reduction, we showed that evidence strands presented at historical murder trials could be used to train effective machine-learning algorithms (or models). We tested the evidence quantification approach with the trained model and showed that, through machine learning, criminal cases could be clearly classified (probability >99.9%) as belonging to either a guilty or a not-guilty category. The classification was found to be as expected for all test cases. All guilty test cases that were not wrongful convictions were correctly assigned to the guilty category by our model and, crucially, test cases that were wrongful convictions were correctly assigned to the not-guilty category. This work demonstrated the potential for machine learning to benefit criminal trial decision-making, and should motivate further testing and development of the model and datasets for assisting the judicial process.


2011 ◽  
Author(s):  
Bruce Ratner ◽  
Stephen Day ◽  
Christopher Davies

2016 ◽  
Vol 21 (3) ◽  
pp. 525-547 ◽  
Author(s):  
Scott Tonidandel ◽  
Eden B. King ◽  
Jose M. Cortina

Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.


Sign in / Sign up

Export Citation Format

Share Document