scholarly journals Making Data and Workflows Findable for Machines

2020 ◽  
Vol 2 (1-2) ◽  
pp. 40-46 ◽  
Author(s):  
Tobias Weigel ◽  
Ulrich Schwardmann ◽  
Jens Klump ◽  
Sofiane Bendoukha ◽  
Robert Quick

Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques.

Author(s):  
M.J. Schulze ◽  
F. Thiemann ◽  
M. Sester

In the context of geo-data infrastructures users may want to combine data from different sources and expect consistent data. If both datasets are maintained separately, different capturing methods and intervals leads to inconsistencies in geometry and semantic, even if the same reality has been modelled. Our project aims to automatically harmonize such datasets and to allow an efficient actualisation of the semantics. The application domain in our project is cadastral and topographic datasets. To resolve geometric conflicts between topographic and cadastral data a local nearest neighbour method was used to identify perpendicular distances between a node in the topographic and an edge in the cadastral dataset. The perpendicular distances are reduced iteratively in a constraint least squares adjustment (LSA) process moving the coordinates from node and edge towards each other. The adjustment result has to be checked for conflicts caused by the movement of the coordinates in the LSA. <br><br> The correct choice of matching partners has a major influence on the result of the LSA. If wrong matching partners are linked a wrong adaptation is derived. Therefore we present an improved matching method, where we take distance, orientation and semantic similarity of the neighbouring objects into account. Using Machine Learning techniques we obtain corresponding land-use classes. From these a measurement for the semantic distance is derived. It is combined with the orientation difference to generate a matching probability for the two matching candidates. Examples show the benefit of the proposed similarity measure.


Author(s):  
Mufti Mahmud ◽  
M. Shamim Kaiser ◽  
T. Martin McGinnity ◽  
Amir Hussain

AbstractRecent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures—known as deep learning (DL)—have been successfully applied to solve many complex pattern recognition problems. To investigate how DL—especially its different architectures—has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures’ applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.


2017 ◽  
Author(s):  
ZhiMin Xiao ◽  
Steve Higgins

Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventional analyses often assume a specific data generation process, which suggests a theoretical model that best fits the data. Machine learning techniques do not make such an assumption. In fact, they encourage multiple models to compete on the same data. Applying logistic regression and machine learning algorithms to real and simulated datasets with different features of noise and signal, we demonstrate that no single model dominates others under all circumstances. By showing when different models shine or struggle, we argue it is both possible and important to conduct comparative analyses.


2021 ◽  
Vol 04 (01) ◽  
Author(s):  
Mahmood Umar ◽  

Nowadays, social media platforms, blogs, and e-commerce are commonly use to express opinion on politics, movies, products, education respectively; for election forecasting, business boosting and improvement of teaching and learning. As a result, data generation becomes easier; producing big data which requires appropriate techniques and tools to analyse easily, accurately and timely. Thus, making sentiment analysis very demanding research area. This study will investigate on what basis (sentiment classification level) or area of application (data source) do supervised machine learning approaches particularly Support Vector Machine (SVM), Naïve Bayes, and Maximum Entropy algorithms, and other technique-lexicon-based approach give the best result in sentiment analysis. Based on the review of the literature there is a contradiction on the point that SVM generated the best result in analyzing student sentiment on document level. This study also discovers that sentiment analysis differs from system to system based on polarity (types of the classes to predict: positive or negative, subjective or objective), different levels of classification (sentence, phrase, or document level) and language that is processed. This research produces a taxonomy which serves as a guide for the choice of techniques in sentiment analysis. The taxonomy explores the sentiment classification levels and data preprocessing stages. It also explores that sentiment analysis techniques were organised in to three (3) groups; Machine learning, Lexicon and hybrid or combination. The machine learning techniques were sub-grouped in to two (2) namely; supervised and unsupervised. The supervised were organized in to two (2): Classification and Regression. un-supervised machine learning techniques includes clustering and association. The clustering technique consist of k-means. Decision tree which is a classification based under supervised type of machine learning technique consist of random forest,(Akinkunmi, 2019) while the ruled-based classifiers consist of confidence criterion and support criterion. The commonly used tools are Weka, Python compiler, and R programming tool.


2020 ◽  
Vol 12 (22) ◽  
pp. 9320 ◽  
Author(s):  
Ana De Las Heras ◽  
Amalia Luque-Sendra ◽  
Francisco Zamora-Polo

The unprecedented urban growth of recent years requires improved urban planning and management to make urban spaces more inclusive, safe, resilient and sustainable. Additionally, humanity faces the COVID pandemic, which especially complicates the management of Smart Cities. A possible solution to address these two problems (environmental and health) in Smart Cities may be the use of Machine Learning techniques. One of the objectives of our work is to thoroughly analyze the link between the concepts of Smart Cities, Machine Learning techniques and their applicability. In this work, an exhaustive study of the relationship between Smart Cities and the applicability of Machine Learning (ML) techniques is carried out with the aim of optimizing sustainability. For this, the ML models, analyzed from the point of view of the models, techniques and applications, are studied. The areas and dimensions of sustainability addressed are analyzed, and the Sustainable Development Goals (SDGs) are discussed. The main objective is to propose a model (EARLY) that allows us to tackle these problems in the future. An inclusive perspective on applicability, sustainability scopes and dimensions, SDGs, tools, data types and Machine Learning techniques is provided. Finally, a case study applied to an Andalusian city is presented.


Processes ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 407
Author(s):  
Ivan Kristianto Singgih

In a semiconductor fab, wafer lots are processed in complex sequences with re-entrants and parallel machines. It is necessary to ensure smooth wafer lot flows by detecting potential disturbances in a real-time fashion to satisfy the wafer lots’ demands. This study aims to identify production factors that significantly affect the system’s throughput level and find the best prediction model. The contributions of this study are as follows: (1) this is the first study that applies machine learning techniques to identify important real-time factors that influence throughput in a semiconductor fab; (2) this study develops a test bed in the Anylogic software environment, based on the Intel minifab layout; and (3) this study proposes a data collection scheme for the production control mechanism. As a result, four models (adaptive boosting, gradient boosting, random forest, decision tree) with the best accuracies are selected, and a scheme to reduce the input data types considered in the models is also proposed. After the reduction, the accuracy of each selected model was more than 97.82%. It was found that data related to the machines’ total idle times, processing steps, and machine E have notable influences on the throughput prediction.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Sign in / Sign up

Export Citation Format

Share Document