Machine Learning for Data Mining, Data Science and Data Analytics

Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.

Download Full-text

Produção científica sobre hospitais no contexto da ciência de dados: um estudo a partir da web of science

Encontros Bibli Revista Eletrônica de Biblioteconomia e Ciência da Informação ◽

10.5007/1518-2924.2021.78824 ◽

2021 ◽

Vol 26 (Especial) ◽

pp. 1-16

Author(s):

Natanael Vitor Sobral ◽

Gillian Leandro de Queiroga Lima ◽

Ana Sara Pereira de Melo Sobral

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Text Mining ◽

Data Warehouse ◽

Data Analytics ◽

Data Science ◽

Web Of Science ◽

Health Technology ◽

Health Records

Objetivo: realizar análise bibliométrica sobre as aplicações da ciência de dados no âmbito das organizações hospitalares. Método: por meio de pesquisa na base de dados Web of Science, verificou-se a existência de termos relacionados à ciência de dados, tais como “big data”, “data analytics”, “businesss intelligence”, “data mining”, “data warehouse”, “text mining” e “data science", relacionando-os a hospitais. A análise de dados pautou-se na técnica de análise de redes sociais. O período considerado foi de 2015 a 2019. Resultado: “machine learning” e “electronic health records” despontam como assuntos relevantes. As interações mais expressivas refletem a inclinação da informática médica em assuntos relacionados à tomada de decisão, sistemas de informação para hospitais e unidade de cuidados intensivos. Sobre os campos científicos, nota-se a predominância esperada da área de saúde e dos domínios pertencentes ou fronteiriços à tecnologia. No mais, vê-se que a grande variedade de áreas encontradas acusa a natureza multidisciplinar do assunto, inclusive com importante participação da Ciência da Informação (CI). Em relação à geografia do conhecimento, observa-se um razoável grau de descentralização, havendo produções representativas na América do Norte, Europa e Ásia. Quanto aos veículos de publicação, destaque para os Studies in Health Technology and Informatics, que compreendem uma série de publicações. Os dois periódicos mais representativos da lista, integram, respectivamente, os grupos Springer Nature e Elsevier, grandes players do mercado editorial científico. Conclusões: por fim, evidencia-se a multidisciplinaridade existente em torno do assunto estudado e a relevância da tecnologia para o progresso das organizações hospitalares.

Download Full-text

Business Intelligence Through Big Data Analytics, Data Mining and Machine Learning

Data Management, Analytics and Innovation - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-9364-8_17 ◽

2019 ◽

pp. 217-230

Author(s):

Wael M. S. Yafooz ◽

Zainab Binti Abu Bakar ◽

S. K. Ahammad Fahad ◽

Ahamed. M Mithun

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Business Intelligence ◽

Data Analytics ◽

Big Data Analytics

Download Full-text

A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems

Management Decision ◽

10.1108/md-01-2020-0035 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Roberto Salazar-Reyna ◽

Fernando Gonzalez-Aleu ◽

Edgar M.A. Granda-Gutierrez ◽

Jenny Diaz-Ramirez ◽

Jose Arturo Garza-Reyes ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Literature Review ◽

Systematic Literature Review ◽

Data Analytics ◽

Research Area ◽

Engineering Systems ◽

Content Type ◽

Healthcare Engineering

PurposeThe objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.Design/methodology/approachA systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.FindingsFrom the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.Research limitations/implicationsThe use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.Originality/valueTo the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.

Download Full-text

How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science

Studies in Big Data - A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years ◽

10.1007/978-3-319-61893-7_17 ◽

2017 ◽

pp. 287-306 ◽

Cited By ~ 3

Author(s):

G. Amato ◽

L. Candela ◽

D. Castelli ◽

A. Esuli ◽

F. Falchi ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Data Base ◽

Data Science ◽

Relational Data ◽

Relational Data Base

Download Full-text

Towards Spatial Data Science: Bridging the Gap between GIS, Cartography and Data Science

Abstracts of the ICA ◽

10.5194/ica-abs-1-403-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-2

Author(s):

Jan Wilkening

Keyword(s):

Machine Learning ◽

Data Mining ◽

Open Source ◽

Real Time ◽

Spatial Data ◽

Data Science ◽

Spatial Concepts ◽

Front End ◽

Gis Tools ◽

University Curricula

Abstract. Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:<ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.

Download Full-text

Big Data Analysis for Trend Recognition Using Machine Learning Techniques

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666200304141238 ◽

2020 ◽

Vol 10 (4) ◽

pp. 540-550

Author(s):

Cerene Mariam Abraham ◽

Mannathazhathu Sudheep Elayidom ◽

Thankappan Santhanakrishnan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Data Analysis ◽

Data Analytics ◽

Research Work ◽

Big Data Analytics ◽

Big Data Analysis ◽

Machine Learning Techniques ◽

Derivative Market

Background: Machine learning is one of the most popular research areas today. It relates closely to the field of data mining, which extracts information and trends from large datasets. Aims: The objective of this paper is to (a) illustrate big data analytics for the Indian derivative market and (b) identify trends in the data. Methods: Based on input from experts in the equity domain, the data are verified statistically using data mining techniques. Specifically, ten years of daily derivative data is used for training and testing purposes. The methods that are adopted for this research work include model generation using ARIMA, Hadoop framework which comprises mapping and reducing for big data analysis. Results: The results of this work are the observation of a trend that indicates the rise and fall of price in derivatives , generation of time-series similarity graph and plotting of frequency of temporal data. Conclusion: Big data analytics is an underexplored topic in the Indian derivative market and the results from this paper can be used by investors to earn both short-term and long-term benefits.

Download Full-text

Data Science in Economics and Business

Advances in Business Information Systems and Analytics - Handbook of Research on Applied Data Science and Artificial Intelligence in Business and Industry ◽

10.4018/978-1-7998-6985-6.ch026 ◽

2021 ◽

pp. 544-568

Author(s):

Mara Madaleno ◽

João Lourenço Marques ◽

Muhammad Tufail

Keyword(s):

Machine Learning ◽

Data Mining ◽

Causal Inference ◽

Bibliometric Analysis ◽

Data Science ◽

State Of The Art ◽

Research Articles

Economics and business are a great background for data science provided econometricians and data scientists are sets with an intersection, although remaining unknown. In econometrics, data mining is somewhat a monstrous word, a field that traditionally seeks causal inference and results in interpretability. When we go deeper into what data science usually is, the boundaries between more traditional econometrics and even statistics and the hip and cool machine learning become shorter. In economics and business, we find examples and applications of simple and advanced data science techniques. This chapter intends to provide state-of-the-art data science applications in economics and business. The review and bibliometric analysis are limited to the research articles published through Elsevier Scopus. Results allowed the authors to conclude that despite the number of already existent research, a lot more remains to be explored joining both fields of knowledge, data since, and economics and business. This analysis allowed the authors to identify further possible avenues of research critically.

Download Full-text