An Overview of Data Veracity Issues in Ship Performance and Navigation Monitoring

Volume 11B: Honoring Symposium for Professor Carlos Guedes Soares on Marine Technology and Ocean Engineering ◽

10.1115/omae2018-77669 ◽

2018 ◽

Cited By ~ 2

Author(s):

Lokukaluge P. Perera ◽

Brage Mo

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Domain Knowledge ◽

Study Data ◽

Data Sets ◽

Data Handling ◽

Navigation Data ◽

Industrial Iot ◽

Ship Performance

An overview of data veracity issues in ship performance and navigation monitoring in relation to data sets collected from a selected vessel is presented in this study. Data veracity relates to the quality of ship performance and navigation parameters obtained by onboard IoT (internet of things). Industrial IoT can introduce various anomalies into measured ship performance and navigation parameters and that can degrade the outcome of the respective data analysis. Therefore, the identification and isolation process of such data anomalies can play an important role in the outcome of ship performance and navigation monitoring. In general, these data anomalies can be divided into sensor and data acquisition (DAQ) faults and system abnormal events. A considerable amount of domain knowledge is required to detect and classify such data anomalies, therefore data anomaly detection layers are proposed in this study for the same purpose. These data anomaly detection layers are divided into several levels: preliminary and advanced levels. The outcome of a preliminary anomaly detection layer with respect to ship performance and navigation data sets of a selected vessel is presented with the respective data handling challenges as the main contribution of this study.

Download Full-text

Visual Analytics in Ship Performance and Navigation Information for Sensor Specific Fault Detection

Volume 7B: Ocean Engineering ◽

10.1115/omae2017-61118 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lokukaluge P. Perera ◽

Brage Mo

Keyword(s):

Fault Detection ◽

Visual Analytics ◽

Domain Knowledge ◽

Data Sets ◽

Data Set ◽

Navigation Data ◽

Hidden Data ◽

Ship Performance ◽

Navigation Information ◽

Erroneous Data

Ocean internet of things (IoT - onboard and onshore) collects big data sets of ship performance and navigation information under various data handling processes. That extract vessel performance and navigation information that are used for ship energy efficiency and emission control applications. However, the quality of ship performance and navigation data can play an important role in such applications, where sensor faults may introduce various erroneous data regions and that may degrade to the outcome. This study proposes visual analytics, where hidden data patterns, clusters, correlations and other useful information are visually from the respective data set extracted, to identify such erroneous data regions. The domain knowledge (i.e. ship performance and navigation conditions) has also been used to interpret such erroneous data regions and identify the respective sensors that relate to the same situations. Finally, a ship performance and navigation data set of a selected vessel is analyzed to identify erroneous data regions for three selected sensor fault situations (i.e. wind, log speed and draft sensors) under the proposed visual analytics. Hence, this approach can be categorized as a sensor specific fault detection methodology by considering the same results.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Combining K-Means and XGBoost Models for Anomaly Detection Using Log Datasets

Electronics ◽

10.3390/electronics9071164 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1164

Author(s):

João Henriques ◽

Filipe Caldeira ◽

Tiago Cruz ◽

Paulo Simões

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Capacity Planning ◽

Domain Knowledge ◽

Large Data ◽

Large Datasets ◽

Complex Data ◽

Security Breaches ◽

Log Files ◽

Clustering And Classification

Computing and networking systems traditionally record their activity in log files, which have been used for multiple purposes, such as troubleshooting, accounting, post-incident analysis of security breaches, capacity planning and anomaly detection. In earlier systems those log files were processed manually by system administrators, or with the support of basic applications for filtering, compiling and pre-processing the logs for specific purposes. However, as the volume of these log files continues to grow (more logs per system, more systems per domain), it is becoming increasingly difficult to process those logs using traditional tools, especially for less straightforward purposes such as anomaly detection. On the other hand, as systems continue to become more complex, the potential of using large datasets built of logs from heterogeneous sources for detecting anomalies without prior domain knowledge becomes higher. Anomaly detection tools for such scenarios face two challenges. First, devising appropriate data analysis solutions for effectively detecting anomalies from large data sources, possibly without prior domain knowledge. Second, adopting data processing platforms able to cope with the large datasets and complex data analysis algorithms required for such purposes. In this paper we address those challenges by proposing an integrated scalable framework that aims at efficiently detecting anomalous events on large amounts of unlabeled data logs. Detection is supported by clustering and classification methods that take advantage of parallel computing environments. We validate our approach using the the well known NASA Hypertext Transfer Protocol (HTTP) logs datasets. Fourteen features were extracted in order to train a k-means model for separating anomalous and normal events in highly coherent clusters. A second model, making use of the XGBoost system implementing a gradient tree boosting algorithm, uses the previous binary clustered data for producing a set of simple interpretable rules. These rules represent the rationale for generalizing its application over a massive number of unseen events in a distributed computing environment. The classified anomaly events produced by our framework can be used, for instance, as candidates for further forensic and compliance auditing analysis in security management.

Download Full-text

Visualization of Relative Wind Profiles in Relation to Actual Weather Conditions of Ship Routes

Volume 7B: Ocean Engineering ◽

10.1115/omae2017-61120 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lokukaluge P. Perera ◽

Brage Mo ◽

Matthias P. Nowak

Keyword(s):

Large Scale ◽

Wind Profile ◽

Weather Conditions ◽

Data Sets ◽

Navigation Data ◽

External Data ◽

Wind Profiles ◽

Tools And Techniques ◽

Ship Performance ◽

Relative Wind

Ship performance and navigation data are collected by vessels that are equipped with various supervisory control and data acquisition systems (SCADA). Such information is collected as large-scale data sets, therefore various analysis tools and techniques are required to extract useful information from the same. The extracted information on ship performance and navigation conditions can be used to implement energy efficiency and emission control applications (i.e. weather routing type applications) on these vessels. Hence, this study proposes to develop data visualizing methods in order to extract ship performance and navigation information from the respective data sets in relation to weather conditions. The relative wind (i.e. apparent wind) profile (i.e. wind speed and direction) collected by onboard sensors and absolute weather conditions, which are extracted from external data sources by using position and time information a selected vessel (i.e. from the recorded ship routes), are considered. Hence, the relative wind profile of the vessel is compared with actual weather conditions to visualize ship performance and navigation parameters relationships, as the main contribution. It is believed that such relationships can be used to develop appropriate mathematical models to predict ship performance and navigation conditions under various weather conditions.

Download Full-text

Provision of an integrated data analysis platform for computational neuroscience experiments

Journal of Systems and Information Technology ◽

10.1108/jsit-01-2014-0004 ◽

2014 ◽

Vol 16 (3) ◽

pp. 150-169 ◽

Cited By ~ 8

Author(s):

Kamran Munir ◽

Saad Liaquat Kiani ◽

Khawar Hasham ◽

Richard McClatchey ◽

Andrew Branson ◽

...

Keyword(s):

Data Analysis ◽

Computational Neuroscience ◽

Image Data ◽

Building Blocks ◽

Study Data ◽

Scientific Workflow ◽

Integrated Analysis ◽

Data Sets ◽

Biomedical Data ◽

Content Type

Purpose – The purpose of this paper is to provide an integrated analysis base to facilitate computational neuroscience experiments, following a user-led approach to provide access to the integrated neuroscience data and to enable the analyses demanded by the biomedical research community. Design/methodology/approach – The design and development of the N4U analysis base and related information services addresses the existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image data sets and algorithms to conduct analyses. Findings – The provision of an integrated e-science environment of computational neuroimaging can enhance the prospects, speed and utility of the data analysis process for neurodegenerative diseases. Originality/value – The N4U analysis base enables conducting biomedical data analyses by indexing and interlinking the neuroimaging and clinical study data sets stored on the grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information.

Download Full-text

Big Data Analysis

Advancing the Power of Learning Analytics and Big Data in Education - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-7998-7103-3.ch010 ◽

2021 ◽

pp. 208-233

Author(s):

Arpit Kumar Sharma ◽

Arvind Dhaka ◽

Amita Nandal ◽

Kumar Swastik ◽

Sunita Kumari

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analysis ◽

Knowledge Extraction ◽

Big Data Analysis ◽

Unstructured Data ◽

Data Sets ◽

Data Handling ◽

Software And Hardware

The meaning of the term “big data” can be inferred by its name itself (i.e., the collection of large structured or unstructured data sets). In addition to their huge quantity, these data sets are so complex that they cannot be analyzed in any way using the conventional data handling software and hardware tools. If processed judiciously, big data can prove to be a huge advantage for the industries using it. Due to its usefulness, studies are being conducted to create methods to handle the big data. Knowledge extraction from big data is very important. Other than this, there is no purpose for accumulating such volumes of data. Cloud computing is a powerful tool which provides a platform for the storage and computation of massive amounts of data.

Download Full-text

An Introduction to Spatial Data Analysis

10.53061/hced6492 ◽

2020 ◽

Author(s):

Martin Wegmann ◽

Jakob Schwalb-Willmann ◽

Stefan Dech

Keyword(s):

Remote Sensing ◽

Data Analysis ◽

Open Source ◽

Spatial Data ◽

Remote Sensing Data ◽

Spatial Data Analysis ◽

Data Sets ◽

Data Handling ◽

Book Covers

This is a book about how ecologists can integrate remote sensing and GIS in their research. It will allow readers to get started with the application of remote sensing and to understand its potential and limitations. Using practical examples, the book covers all necessary steps from planning field campaigns to deriving ecologically relevant information through remote sensing and modelling of species distributions. An Introduction to Spatial Data Analysis introduces spatial data handling using the open source software Quantum GIS (QGIS). In addition, readers will be guided through their first steps in the R programming language. The authors explain the fundamentals of spatial data handling and analysis, empowering the reader to turn data acquired in the field into actual spatial data. Readers will learn to process and analyse spatial data of different types and interpret the data and results. After finishing this book, readers will be able to address questions such as “What is the distance to the border of the protected area?”, “Which points are located close to a road?”, “Which fraction of land cover types exist in my study area?” using different software and techniques. This book is for novice spatial data users and does not assume any prior knowledge of spatial data itself or practical experience working with such data sets. Readers will likely include student and professional ecologists, geographers and any environmental scientists or practitioners who need to collect, visualize and analyse spatial data. The software used is the widely applied open source scientific programs QGIS and R. All scripts and data sets used in the book will be provided online at book.ecosens.org. This book covers specific methods including: what to consider before collecting in situ data how to work with spatial data collected in situ the difference between raster and vector data how to acquire further vector and raster data how to create relevant environmental information how to combine and analyse in situ and remote sensing data how to create useful maps for field work and presentations how to use QGIS and R for spatial analysis how to develop analysis scripts

Download Full-text

Digitalization of Seagoing Vessels Under High Dimensional Data Driven Models

Volume 7A: Ocean Engineering ◽

10.1115/omae2017-61011 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lokukaluge P. Perera ◽

Brage Mo

Keyword(s):

Domain Knowledge ◽

Large Scale ◽

Dimensional Space ◽

Data Driven ◽

High Dimensional ◽

Data Handling ◽

Shipping Industry ◽

Data Set ◽

Data Clusters ◽

Ship Performance

Modern ships are supported by internet of things (IoT) to collect ship performance and navigation information. That should be utilized towards digitalization of the shipping industry. However, such information collection systems are always associated with large-scale data sets, so called Big Data, where various industrial challenges are encountered during the respective data handling processes. This study proposes a data handling framework with data driven models (i.e. digital models) to cope with the shipping industrial challenges as the main contribution, where conventional mathematical models may fail. The proposed data driven models are developed in a high dimensional space, where the respective ship performance and navigation parameters of a selected vessel are separated as several data clusters. Hence, this study identifies the distribution of the respective data clusters and the structure of each data cluster in relation to ship performance and navigation conditions. An appropriate structure into the data set of ship performance and navigation parameters is assigned by this method as the main contribution. However, the domain knowledge (i.e. vessel operational and navigation conditions) is also included in this situation to derive a meaningful data structure.

Download Full-text

EDUCATIONAL INNOVATION IN DEVELOPING QUALITY MANAGEMENT MUHAMMADIYAH BEST ELEMENTARY SCHOOL BANDUNG

Ta'dib: Jurnal Pendidikan Islam ◽

10.29313/tjpi.v10i1.7003 ◽

2021 ◽

Vol 10 (1) ◽

pp. 9-24

Author(s):

Iim Ibrohim ◽

Agus Salim Mansyur ◽

Muhibbin Syah ◽

Uus Ruswandi

Keyword(s):

Data Analysis ◽

Quality Management ◽

Research Method ◽

Qualitative Data ◽

Educational Innovation ◽

School Management ◽

Study Data ◽

Inhibiting Factors ◽

Qualitative Data Collection

This research aims to identify the supporting and inhibiting factors of educational innovation implemented by SD Muhamadiyah 7 Kota Bandung. By identifying supporting and inhibiting factors, the implementation of the innovation program is hoped can be more successful. The research method that was used in this research was descriptive qualitative. Data collection techniques through observation, interviews, and documentation study. Data analysis was performed by reducing data, displaying it, taking conclusions, verifying, analyzing, summarizing, and analyzing data. The research results show; supporting and inhibiting factors for educational innovation SD Muhammadiyah 7 Bandung City in developing the quality of school management, some are internal and external.

Download Full-text

Modern Techniques in Data Analysis, with Application to the Water Pollution

Proceedings of the Latvian Academy of Sciences Section B Natural Exact and Applied Sciences ◽

10.2478/prolas-2018-0005 ◽

2018 ◽

Vol 72 (3) ◽

pp. 184-192 ◽

Cited By ~ 1

Author(s):

Haroon M. Barakat ◽

Osama Mohareb Khaled ◽

N. Khalil Rakha

Keyword(s):

Water Pollution ◽

Drinking Water ◽

Data Analysis ◽

Normal Family ◽

Distribution Theory ◽

Data Sets ◽

Practical Application ◽

Asymmetry Parameters ◽

Best Fit

Abstract This paper presents a comparison of most capable families of distributions for modelling asymmetry. Kum-normal, stable-symmetric normal family and two of the full families were chosen, where the quality of the fit, the flexibility and the amount of asymmetry parameters were factors used for comparison. The objective of this study was to generate data with increasing levels of asymmetry and to choose the best fit. The distributions were also compared in modelling two data sets of pollution of the drinking water in the El-Sharkia governorate in Egypt. Much of this paper is concerned with the distribution theory, exploring the properties of some new recent families of distributions and, where appropriate, extolling their virtues. Relatively, much of this paper is devoted to practical application.

Download Full-text