Development of a learning pilot for the remote teaching of Smart Maintenance using open source tools

Mapping Intimacies ◽

10.4995/head21.2021.13140 ◽

2021 ◽

Author(s):

Maira Callupe ◽

Luca Fumagalli ◽

Domenico Daniele Nucera

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Computer Programming ◽

Educational Institutions ◽

Educational Tools ◽

Vast Array ◽

New Graduates ◽

The Future

Technology has created a vast array of educational tools readily available to educators, but it also has created a shift in the skills and competences demanded from new graduates. As data science and machine learning are becoming commonplace across all industries, computer programming is emerging as one of the fundamental skills engineers will require to navigate the future and current workplace. It is, thus, the responsibility of educational institutions to rise to this challenge and to provide students with an appropriate training that facilitates the development of these skills. The purpose of this paper is to explore the potential of open source tools to introduce students to the more practical side of Smart Maintenance. By developing a learning pilot based mainly on computational notebooks, students without a programming background are walked through the relevant techniques and algorithms in an experiential format. The pilot highlights the superiority of Colab notebooks for the remote teaching of subjects that deal with data science and programming. The resulting insights from the experience will be used for the development of subsequent iterations during the current year.

Download Full-text

Jupyter: Thinking and Storytelling with Code and Data

10.22541/au.161298309.98344404/v2 ◽

2021 ◽

Author(s):

Brian Granger ◽

Fernando Pérez

Keyword(s):

Machine Learning ◽

Open Source ◽

Community Of Practice ◽

Scientific Computing ◽

Data Science ◽

Climate Science ◽

Three Dimensions ◽

Open Source Project ◽

Interactive Computing ◽

The Impact

Project Jupyter is an open-source project for interactive computing widely used in data science, machine learning, and scientific computing. We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: interactive computing, computational narratives, and the idea that Jupyter is more than software. We illustrate the impact of these dimensions on a community of practice in Earth and climate science.

Download Full-text

Ligo: An Open Source Application for the Management and Execution of Administrative Data Linkage

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.749 ◽

2018 ◽

Vol 3 (4) ◽

Author(s):

Greg Lawrance ◽

Raphael Parra Hernandez ◽

Khalegh Mamakani ◽

Suraiya Khan ◽

Brent Hills ◽

...

Keyword(s):

Machine Learning ◽

Open Source ◽

Administrative Data ◽

Data Science ◽

Population Data ◽

Probabilistic Methods ◽

Learning Approaches ◽

Web Interface ◽

Science Community ◽

Comparison Algorithms

IntroductionLigo is an open source application that provides a framework for managing and executing administrative data linking projects. Ligo provides an easy-to-use web interface that lets analysts select among data linking methods including deterministic, probabilistic and machine learning approaches and use these in a documented, repeatable, tested, step-by-step process. Objectives and ApproachThe linking application has two primary functions: identifying common entities in datasets [de-duplication] and identifying common entities between datasets [linking]. The application is being built from the ground up in a partnership between the Province of British Columbia’s Data Innovation (DI) Program and Population Data BC, and with input from data scientists. The simple web interface allows analysts to streamline the processing of multiple datasets in a straight-forward and reproducible manner. ResultsBuilt in Python and implemented as a desktop-capable and cloud-deployable containerized application, Ligo includes many of the latest data-linking comparison algorithms with a plugin architecture that supports the simple addition of new formulae. Currently, deterministic approaches to linking have been implemented and probabilistic methods are in alpha testing. A fully functional alpha, including deterministic and probabilistic methods is expected to be ready in September, with a machine learning extension expected soon after. Conclusion/ImplicationsLigo has been designed with enterprise users in mind. The application is intended to make the processes of data de-duplication and linking simple, fast and reproducible. By making the application open source, we encourage feedback and collaboration from across the population research and data science community.

Download Full-text

Towards Spatial Data Science: Bridging the Gap between GIS, Cartography and Data Science

Abstracts of the ICA ◽

10.5194/ica-abs-1-403-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-2

Author(s):

Jan Wilkening

Keyword(s):

Machine Learning ◽

Data Mining ◽

Open Source ◽

Real Time ◽

Spatial Data ◽

Data Science ◽

Spatial Concepts ◽

Front End ◽

Gis Tools ◽

University Curricula

Abstract. Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:<ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text

PAAIcon2016: Multivariate statistics for hydrogeology: moving forward from "the present is the key to the past"

10.31227/osf.io/fbgmy ◽

2017 ◽

Author(s):

Dasapta Erwin Irawan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Open Source ◽

Multivariate Statistics ◽

Family Life ◽

Principal Component ◽

Demographic Health Survey ◽

The Past ◽

The Future

This abstract has been presented at the PAAI conference 2016, 16-17 Nov 2016. Consists of to part: Part 1 Introduction to Open Science (zip file) and Part 2 Multivariate statistics in hydrogeology.ABSTRACT Geology is one of the oldest science in the world. Originated from natural science, it grows from the observation of sea shells to the sophisticated interpretation of the earth interior. On recent development geological approach need to be more quantitative, related to the needs prediction and simulation. Geology has shifted from “the present is the key to the past” towards “the present is the key to the past as the base of prediction of the future”. Hydrogeology is one of the promising branch of geology that relies more to quantitative analysis. Multivariate statistics is one of the most frequently used resources in this field. We did some literature search and web scraping to analyze current situation and future trend of multivariate statistics application for geological synthesis. We used several sets of keywords but this set gave the most satifying results: “(all in title) multivariate statistics (and) groundwater”, on Google Scholar, Crossref, and ScienceOpen database. The final result was 164 papers. We used VosViewer and Zotero to do some text mining operations. Based on the analysis we can draw some results. Cluster analysis and principal component analysis are still the most frequently used method in hydrogeology. Both are mostly used to extract hydrochemical and isotope data to analyze the hydrogeological nature of groundwater flow. More machine learning methods have been introduced in the last five years in hydrogeological science. `Random forest` and `decision tree` technique are used extensively to learn the from physical and chemical properties of groundwater. Open source tools have also shifted the use of major statistical or programming language such as: SAS and Matlab. Python and R programming are the two famous open source applications in this field. We also note the increase of papers to discuss hydrogeology and public health sector. Therefore such methods are also being used to analyze open demographic data like DHS (demographic health survey) and FLS (Family Life Survey). Strong community of programmer makes the exponential development of both languages, via platform like Github. This has become the future of hydrogeology. ABSTRAK Geologi adalah salah satu ilmu tertua di dunia. Berasal dari ilmu alam, ia berkembang dari observasi kerang laut ke arah interpretasi interior bumi yang kompleks. Dalam perkembangannya saat ini, geologi memerlukan pendekatan yang lebih kuantitatif, berkaitan dengan kebutuhan untuk prediksi dan simulasi. Geologi telah bergeser dari “the present is the key to the past” (saat ini adalah kunci menuju masa lalu) menjadi “the present is the key to the past as the base of prediction of the future” (saat ini adalah kunci menuju masa lalu dan sebagai dasar prediksi masa depan. Hidrogeologi adalah salah satu cabang ilmu geologi yang bersandar kepada analisis kuantitatif. Statistik multivariabel adalah salah satu metode yang digunakan dalam bidang ini. Kami telah melakukan telaah literatur dan penyadapan web untuk menganalisis kondisi saat ini dan trend masa depan tentang aplikasi statistik multivariabel untuk sintesis geologi. Beberapa set kata kunci digunakan, tetapi yang berikut ini memberikan hasil paling memuaskan: “(all in title) multivariate statistics (and) groundwater”. Database Google Scholar, Crossref, dan ScienceOpen menjadi sumber informasi yang menghasilkan hasil terseleksi sebanyak 164 makalah ilmiah. Kami menggunakan aplikasi VosViewer and Zotero untuk mengolah data teks (text mining). Berdasarkan analisis, cluster analysis dan principal component analysis masih menjadi teknik yang paling banyak dipakai. Keduanya umumnya digunakan untuk mengesktrak data hidrokimia dan isotop untuk menganalisis kondisi hidrogeologi dan aliran air tanah. Lebih banyak lagi metode machine learning (pembelajaran mesin) telah dikenalkan dan digunakan dalam lima tahun terakhir. Teknik “Random forest” and “decision tree” yang merupakan pengembangan dari teknik regresi linear juga telah banyak digunakan untuk mempelajari sifat fisik dan kimia air tanah. Penggunaan aplikasi open source juga telah menggeser piranti lunak berbayar yang mahal, seperti SAS and Matlab. Bahasa pemrograman Python and R adalah beberapa saja yang terkenal dalam bidang machine learning. Kami juga menangkap peningkatan jumlah makalah yang isinya merupakan irisan antara bidang hidrogeologi dan kesehatan masyarakat. Karena itu teknik machine learning juga digunakan untuk menganalisis data terbuka demografi seperti DHS (demographic health survey) dan FLS (Family Life Survey). Komunitas programmer yang kuat mampu mengembangan piranti lunak open source ini secara eksponensial, melalui platform seperti Github. Hal ini telah menjadi masa depan dari hidrogeologi.

Download Full-text

Digital Epidemiology of Innovation (Preprint)

10.2196/preprints.30393 ◽

2021 ◽

Author(s):

Ivan Triana ◽

LUIS PINO ◽

Dennise Rubio

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Management ◽

Medical Practice ◽

Data Science ◽

Medical Literature ◽

The Future ◽

Digital Epidemiology ◽

Definition Of ◽

And Control

UNSTRUCTURED Bio and infotech revolution including data management are global tendencies that have a relevant impact on healthcare. Concepts such as Big Data, Data Science and Machine Learning are now topics of interest within medical literature. All of them are encompassed in what recently is named as digital epidemiology. The purpose of this article is to propose our definition of digital epidemiology with the inclusion of a further aspect: Innovation. It means Digital Epidemiology of Innovation (DEI) and show the importance of this new branch of epidemiology for the management and control of diseases. In this sense, we will describe all characteristics concerning to the topic, current uses within medical practice, application for the future and applicability of DEI as conclusion.

Download Full-text

The Education of the Human Factors Engineer in the Age of Data Science

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641109 ◽

2020 ◽

Vol 64 (1) ◽

pp. 480-484

Author(s):

Daniel Hannon ◽

Esa Rantanen ◽

Ben Sawyer ◽

Ashley Hughes ◽

Katherine Darveau ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Human Factors ◽

Data Science ◽

Panel Discussion ◽

Quantitative Skills ◽

The Future ◽

Relevant Role ◽

Machine Learning Applications

The continued advances in artificial intelligence and automation through machine learning applications, under the heading of data science, gives reason for pause within the educator community as we consider how to position future human factors engineers to contribute meaningfully in these projects. Do the lessons we learned and now teach regarding automation based on previous generations of technology still apply? What level of DS and ML expertise is needed for a human factors engineer to have a relevant role in the design of future automation? How do we integrate these topics into a field that often has not emphasized quantitative skills? This panel discussion brings together human factors engineers and educators at different stages of their careers to consider how curricula are being adapted to include data science and machine learning, and what the future of human factors education may look like in the coming years.

Download Full-text

Fusion Skills and Industry 5.0: Conceptions and Challenges

10.5772/intechopen.100096 ◽

2021 ◽

Author(s):

John Mitchell ◽

David Guile

Keyword(s):

Machine Learning ◽

Engineering Education ◽

Data Science ◽

Core Curriculum ◽

Digital Technologies ◽

Education Curriculum ◽

Engineering Knowledge ◽

The Core ◽

The Future

The nature of work is changing rapidly, driven by the digital technologies that underpin industry 5.0. It has been argued worldwide that engineering education must adapt to these changes which have the potential to rewrite the core curriculum across engineering as a broader range of skills compete with traditional engineering knowledge. Although it is clear that skills such as data science, machine learning and AI will become fundamental skills of the future it is less clear how these should be integrated into existing engineering education curricula to ensure relevance of graduates. This chapter looks at the nature of future fusion skills and the range of strategies that might be adopted to integrated these into the existing engineering education curriculum.

Download Full-text

Digital Technologies and Data Science as Health Enablers: An Outline of Appealing Promises and Compelling Ethical, Legal, and Social Challenges

Frontiers in Medicine ◽

10.3389/fmed.2021.647897 ◽

2021 ◽

Vol 8 ◽

Author(s):

João V. Cordeiro

Keyword(s):

Public Health ◽

Machine Learning ◽

Decision Making ◽

Data Science ◽

Digital Health ◽

Digital Technologies ◽

Machine Learning Algorithms ◽

Human Interaction ◽

The Future ◽

Human Skills

Digital technologies and data science have laid down the promise to revolutionize healthcare by transforming the way health and disease are analyzed and managed in the future. Digital health applications in healthcare include telemedicine, electronic health records, wearable, implantable, injectable and ingestible digital medical devices, health mobile apps as well as the application of artificial intelligence and machine learning algorithms to medical and public health prognosis and decision-making. As is often the case with technological advancement, progress in digital health raises compelling ethical, legal, and social implications (ELSI). This article aims to succinctly map relevant ELSI of the digital health field. The issues of patient autonomy; assessment, value attribution, and validation of health innovation; equity and trustworthiness in healthcare; professional roles and skills and data protection and security are highlighted against the backdrop of the risks of dehumanization of care, the limitations of machine learning-based decision-making and, ultimately, the future contours of human interaction in medicine and public health. The running theme to this article is the underlying tension between the promises of digital health and its many challenges, which is heightened by the contrasting pace of scientific progress and the timed responses provided by law and ethics. Digital applications can prove to be valuable allies for human skills in medicine and public health. Similarly, ethics and the law can be interpreted and perceived as more than obstacles, but also promoters of fairness, inclusiveness, creativity and innovation in health.

Download Full-text

The potential influence of machine learning and data science on the future of economics: Overview of highly-cited research

10.31235/osf.io/9nh8g ◽

2020 ◽

Author(s):

Advait Deshpande

Keyword(s):

Machine Learning ◽

Data Science ◽

Google Scholar ◽

Science Methods ◽

Potential Influence ◽

Working Paper ◽

The Future ◽

Highly Cited

This working paper provides an overview of the potential influence of machine learning and data science on economics as a field. The findings presented are drawn from highly cited research which was identified based on Google Scholar searches. For each of the articles reviewed, this working paper covers what is likely to change and what is likely to remain unchanged in economics due to the emergence and increasing influence of machine learning and data science methods.

Download Full-text