DN3: An open-source Python library for large-scale raw neurophysiology data assimilation for more flexible and standardized deep learning

Mapping Intimacies ◽

10.1101/2020.12.17.423197 ◽

2020 ◽

Author(s):

Demetres Kostas ◽

Frank Rudzicz

Keyword(s):

Deep Learning ◽

Open Source ◽

Large Scale ◽

General Scheme ◽

Imagery Task ◽

Public And Private ◽

Model Library ◽

Brain Data ◽

Private Datasets ◽

Future Work

AbstractWe propose an open-source Python library, called DN3, designed to accelerate deep learning (DL) analysis with encephalographic data. This library focuses on making experimentation rapid and reproducible and facilitates the integration of both public and private datasets. Furthermore, DN3 is designed in the interest of validating DL processes that include, but are not limited to, classification and regression across many datasets to prove capacity for generalization. We explore the effectiveness of this library by presenting a general scheme for person disambiguation called T-Vectors inspired by speech recognition. These are single vectors created by typically short, though arbitrary in length, electro-encephalographic (EEG) data sequences that uniquely identify users relative to others. T-Vectors were trained by classifying nearly 1000 people using as little as 1 second-long sequences and generalize effectively to users never seen during training. Generalized performance is demonstrated on two commonly used and publicly accessible motor imagery task datasets, which are notorious for intra- and inter-subject signal variability. According to these datasets, subjects can be identified with accuracies as high as 97.7% by simply adopting the label of the nearest neighbouring T-Vectors, with no dependence on task performed and little dependence on recording session, even when sessions are separated by days. Visualization of the T-Vectors from both datasets show no conflation of subjects between datasets, and indicates a T-Vector manifold where subjects cluster well. We first conclude that this is a desirable paradigm shift in EEG-based biometrics and secondly that this manifold deserves further investigation. Our proposed library provides a variety of essential tools that facilitated the development of T-Vectors. The T-vectors codebase serves as a template for future projects using DN3, and we encourage leveraging our provided model for future work.Author summaryWe present a new Python library to train deep learning (DL) models with brain data. This library is tailored, but not limited, to developing neural networks for brain-computer-interfaces (BCI) applications. There is abundant interest in leveraging DL in the wider neuroscience community, but we have found current solutions limiting. Furthermore both BCI and DL benefit from benchmarking against multiple datasets and sharing parameters. Our library tries to be accessible to DL novices, yet not limiting to experts, while making experiment configurations more easily shareable and flexible for benchmarking. We demonstrated many of the features of our library by developing a deep neural network capable of disambiguating people from arbitrary lengths of electroencephalography data. We identify a variety of future avenues of study for these representations produced by our network, particularly in biometric applications and addressing the variation in BCI classifier performance. We share our model, library and its associated guides and documentation with the community at large.

Download Full-text

Malware Detection using Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35426 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1847-1853

Author(s):

T. Shiva Rama Krishna

Keyword(s):

Image Processing ◽

Deep Learning ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Image Processing Technique ◽

Feature Engineering ◽

Public And Private ◽

Static And Dynamic Analysis ◽

Private Datasets ◽

Learning Architectures

Malicious software or malware continues to pose a major security concern in this digital age as computer users, corporations, and governments witness an exponential growth in malware attacks. Current malware detection solutions adopt Static and Dynamic analysis of malware signatures and behaviour patterns that are time consuming and ineffective in identifying unknown malwares. Recent malwares use polymorphic, metamorphic and other evasive techniques to change the malware behaviour’s quickly and to generate large number of malwares. Since new malwares are predominantly variants of existing malwares, machine learning algorithms are being employed recently to conduct an effective malware analysis. This requires extensive feature engineering, feature learning and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Though some recent research studies exist in this direction, the performance of the algorithms is biased with the training data. There is a need to mitigate bias and evaluate these methods independently in order to arrive at new enhanced methods for effective zero-day malware detection. To fill the gap in literature, this work evaluates classical MLAs and deep learning architectures for malware detection, classification and categorization with both public and private datasets. The train and test splits of public and private datasets used in the experimental analysis are disjoint to each other’s and collected in different timescales. In addition, we propose a novel image processing technique with optimal parameters for MLAs and deep learning architectures. A comprehensive experimental evaluation of these methods indicate that deep learning architectures outperform classical MLAs. Overall, this work proposes an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments. The visualization and deep learning architectures for static, dynamic and image processing-based hybrid approach in a big data environment is a new enhanced method for effective zero-day malware detection.

Download Full-text

TOWARDS DETECTING FLOATING OBJECTS ON A GLOBAL SCALE WITH LEARNED SPATIAL FEATURES USING SENTINEL 2

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2021-285-2021 ◽

2021 ◽

Vol V-3-2021 ◽

pp. 285-293

Author(s):

J. Mifdal ◽

N. Longépé ◽

M. Rußwurm

Keyword(s):

Deep Learning ◽

Large Scale ◽

Global Scale ◽

Water Bodies ◽

Oceanic Fronts ◽

Large Scale Data ◽

Spatial Features ◽

Floating Objects ◽

Future Work ◽

Sentinel 2

Abstract. Marine litter is a growing problem that has been attracting attention and raising concerns over the last years. Significant quantities of plastic can be found in the oceans due to the unfiltered discharge of waste into rivers, poor waste management, or lost fishing nets. The floating elements drift on the surface of water bodies and can be aggregated by processes, such as river plumes, windrows, oceanic fronts, or currents. In this paper, we focus on detecting big patches of floating objects that can contain plastic as well as other materials with optical Sentinel 2 data. In contrast to previous work that focuses on pixel-wise spectral responses of some bands, we employ a deep learning predictor that learns the spatial characteristics of floating objects. Along with this work, we provide a hand-labeled Sentinel 2 dataset of floating objects on the sea surface and other water bodies such as lakes together with pre-trained deep learning models. Our experiments demonstrate that harnessing the spatial patterns learned with a CNN is advantageous over pixel-wise classifications that use hand-crafted features. We further provide an analysis of the categories of floating objects that we captured while labeling the dataset and analyze the feature importance for the CNN predictions. Finally, we outline the limitations of trained CNN on several systematic failure cases that we would like to address in future work by increasing the diversity in the dataset and tackling the domain shift between regions and satellite acquisitions. The dataset introduced in this work is the first to provide public large-scale data for floating litter detection and we hope it will give more insights into developing techniques for floating litter detection and classification. Source code and data are available at https://github.com/ESA-PhiLab/floatingobjects.

Download Full-text

Deep Learning assisted Peak Curation for large scale LC-MS Metabolomics

10.1101/2020.08.09.242727 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yoann Gloaguen ◽

Jennifer Kirwan ◽

Dieter Beule

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Open Source ◽

Large Scale ◽

Peak Detection ◽

Untargeted Metabolomics ◽

Peak List ◽

Ms Analysis ◽

Automated Methods

ABSTRACTAvailable automated methods for peak detection in untargeted metabolomics suffer from poor precision. We present NeatMS which uses machine learning to replace peak curation by human experts. We show how to integrate our open source module into different LC-MS analysis workflows and quantify its performance. NeatMS is designed to be suitable for large scale studies and improves the robustness of the final peak list.

Download Full-text

BioNetwork Bench: Database and Software for Storage, Query, and Analysis of Gene and Protein Networks

Bioinformatics and Biology Insights ◽

10.4137/bbi.s9728 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S9728

Author(s):

Oksana Kohutyuk ◽

Fadi Towfic ◽

M. Heather West Greenlee ◽

Vasant Honavar

Keyword(s):

Gene Expression ◽

Open Source ◽

High Throughput ◽

Network Models ◽

Protein Network ◽

Protein Networks ◽

Public And Private ◽

Multiple Networks ◽

Private Datasets ◽

User Friendly

Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from high-throughput analyses. Although many tools and databases are currently available for accessing such data, they are left unutilized by bench scientists as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by scientists with limited computational expertise. We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. It enables biologists to analyze public as well as private gene expression; interactively query gene expression datasets; integrate data from multiple networks; store and selectively share the data and results. Finally, we describe an application of BioNetwork Bench to the assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors. The tool is available from http://bionetworkbench.sourceforge.net/ Background The emergence of high-throughput technologies has allowed many biological investigators to collect a great deal of information about the behavior of genes and gene products over time or during a particular disease state. Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from such high-throughput analyses. There are a growing number of public databases, as well as tools for visualization and analysis of networks. However, such databases and tools have yet to be widely utilized by bench scientists, as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by biological scientists with limited computational expertise. Results We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. BioNetwork Bench currently supports a broad class of gene and protein network models (eg, weighted and un-weighted, undirected graphs, multi-graphs). It enables biologists to analyze public as well as private gene expression, macromolecular interaction and annotation data; interactively query gene expression datasets; integrate data from multiple networks; query multiple networks for interactions of interest; store and selectively share the data as well as results of analyses. BioNetwork Bench is implemented as a plug-in for, and hence is fully interoperable with, Cytoscape, a popular open-source software suite for visualizing macromolecular interaction networks. Finally, we describe an application of BioNetwork Bench to the problem of assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors. Conclusions BioNetwork Bench provides a suite of open source software for construction, querying, and selective sharing of gene and protein networks. Although initially aimed at a community of biologists interested in retinal development, the tool can be adapted easily to work with other biological systems simply by populating the associated database with the relevant datasets.

Download Full-text

How participative is open source hardware? Insights from online repository mining

Design Science ◽

10.1017/dsj.2018.15 ◽

2018 ◽

Vol 4 ◽

Cited By ~ 3

Author(s):

Jérémy Bonvoisin ◽

Tom Buchert ◽

Maurice Preidel ◽

Rainer G. Stark

Keyword(s):

Product Development ◽

Open Source ◽

Large Scale ◽

Public And Private ◽

Scientific Debate ◽

Repository Mining ◽

Organizational Patterns ◽

Viable Approach ◽

Stakeholder Interactions ◽

Open Source Hardware

Open Source Hardware (OSH) is an increasingly viable approach to intellectual property management extending the principles of Open Source Software (OSS) to the domain of physical products. These principles support the development of products in transparent processes allowing the participation of any interested person. While increasing numbers of products have been released as OSH, little is known on the prevalence of participative development practices in this emerging field. It remains unclear to which extent the transparent and participatory processes known from software reached hardware product development. To fill this gap, this paper applies repository mining techniques to investigate the transparency and workload distribution of 105 OSH product development projects. The results highlight a certain heterogeneity of practices filling a continuum between public and private development settings. They reveal different organizational patterns with different levels of centralization and distribution. Nonetheless, they clearly indicate the expansion of the open source development model from software into the realms of physical products and provide the first large-scale empirical evidence of this recent evolution. Therewith, this article gives body to an emerging phenomenon and contributes to give it a place in the scientific debate. It delivers categories to delineate practices, techniques to investigate them in further detail as well as a large dataset of exemplary OSH projects. The discussion of first results signposts avenues for a stream of research aiming at understanding stakeholder interactions at work in new product innovation practices in order to enable institutions and industry in providing appropriate responses.

Download Full-text

Keynote: The computer science behind the Microsoft Cognitive Toolkit: An open source large-scale deep learning toolkit for Windows and Linux

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) ◽

10.1109/cgo.2017.7863722 ◽

2017 ◽

Cited By ~ 7

Author(s):

Frank Seide

Keyword(s):

Deep Learning ◽

Computer Science ◽

Open Source ◽

Large Scale

Download Full-text

Social inequality in the evolution of human societies

Sociology: Theory, Methods, Marketing ◽

10.15407/sociology2019.02.098 ◽

2019 ◽

pp. 98-120

Author(s):

Georgi Derluguian

Keyword(s):

Social Inequality ◽

Large Scale ◽

Human Beings ◽

Mass Violence ◽

Public And Private ◽

Tangible Assets ◽

Transition To Agriculture ◽

New Institutions ◽

High Level ◽

Political Economic

The author develops ideas about the origin of social inequality during the evolution of human societies and reflects on the possibilities of its overcoming. What makes human beings different from other primates is a high level of egalitarianism and altruism, which contributed to more successful adaptability of human collectives at early stages of the development of society. The transition to agriculture, coupled with substantially increasing population density, was marked by the emergence and institutionalisation of social inequality based on the inequality of tangible assets and symbolic wealth. Then, new institutions of warfare came into existence, and they were aimed at conquering and enslaving the neighbours engaged in productive labour. While exercising control over nature, people also established and strengthened their power over other people. Chiefdom as a new type of polity came into being. Elementary forms of power (political, economic and ideological) served as a basis for the formation of early states. The societies in those states were characterised by social inequality and cruelties, including slavery, mass violence and numerous victims. Nowadays, the old elementary forms of power that are inherent in personalistic chiefdom are still functioning along with modern institutions of public and private bureaucracy. This constitutes the key contradiction of our time, which is the juxtaposition of individual despotic power and public infrastructural one. However, society is evolving towards an ever more efficient combination of social initiatives with the sustainability and viability of large-scale organisations.

Download Full-text

Faculty Opinions recommendation of A Scalable Open-Source Pipeline for Large-Scale Root Phenotyping of Arabidopsis.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718447140.793507250 ◽

2015 ◽

Author(s):

José Dinneny

Keyword(s):

Open Source ◽

Large Scale ◽

Root Phenotyping

Download Full-text

Multi Disease-Prediction Framework Using Hybrid Deep Learning: An Optimal Prediction Model (Preprint)

10.2196/preprints.22865 ◽

2020 ◽

Author(s):

Anusha Ampavathi ◽

Vijaya Saradhi T

Keyword(s):

Feature Extraction ◽

Big Data ◽

Deep Learning ◽

Weight Function ◽

Optimization Algorithm ◽

Large Scale ◽

Heuristic Algorithms ◽

Disease Prediction ◽

Health Care Decisions ◽

Proposed Model

UNSTRUCTURED Big data and its approaches are generally helpful for healthcare and biomedical sectors for predicting the disease. For trivial symptoms, the difficulty is to meet the doctors at any time in the hospital. Thus, big data provides essential data regarding the diseases on the basis of the patient’s symptoms. For several medical organizations, disease prediction is important for making the best feasible health care decisions. Conversely, the conventional medical care model offers input as structured that requires more accurate and consistent prediction. This paper is planned to develop the multi-disease prediction using the improvised deep learning concept. Here, the different datasets pertain to “Diabetes, Hepatitis, lung cancer, liver tumor, heart disease, Parkinson’s disease, and Alzheimer’s disease”, from the benchmark UCI repository is gathered for conducting the experiment. The proposed model involves three phases (a) Data normalization (b) Weighted normalized feature extraction, and (c) prediction. Initially, the dataset is normalized in order to make the attribute's range at a certain level. Further, weighted feature extraction is performed, in which a weight function is multiplied with each attribute value for making large scale deviation. Here, the weight function is optimized using the combination of two meta-heuristic algorithms termed as Jaya Algorithm-based Multi-Verse Optimization algorithm (JA-MVO). The optimally extracted features are subjected to the hybrid deep learning algorithms like “Deep Belief Network (DBN) and Recurrent Neural Network (RNN)”. As a modification to hybrid deep learning architecture, the weight of both DBN and RNN is optimized using the same hybrid optimization algorithm. Further, the comparative evaluation of the proposed prediction over the existing models certifies its effectiveness through various performance measures.

Download Full-text

A standardized framework for testing the performance of sleep-tracking technology: step-by-step guidelines and open-source code

SLEEP ◽

10.1093/sleep/zsaa170 ◽

2020 ◽

Author(s):

Luca Menghini ◽

Nicola Cellini ◽

Aimee Goldstone ◽

Fiona C Baker ◽

Massimiliano de Zambotti

Keyword(s):

Open Source ◽

Validation Studies ◽

Large Scale ◽

Analytical Framework ◽

Clinical Settings ◽

Large Scale Data ◽

Fast Pace ◽

Epoch Analysis ◽

Tracking Devices

Abstract Sleep-tracking devices, particularly within the consumer sleep technology (CST) space, are increasingly used in both research and clinical settings, providing new opportunities for large-scale data collection in highly ecological conditions. Due to the fast pace of the CST industry combined with the lack of a standardized framework to evaluate the performance of sleep trackers, their accuracy and reliability in measuring sleep remains largely unknown. Here, we provide a step-by-step analytical framework for evaluating the performance of sleep trackers (including standard actigraphy), as compared with gold-standard polysomnography (PSG) or other reference methods. The analytical guidelines are based on recent recommendations for evaluating and using CST from our group and others (de Zambotti and colleagues; Depner and colleagues), and include raw data organization as well as critical analytical procedures, including discrepancy analysis, Bland–Altman plots, and epoch-by-epoch analysis. Analytical steps are accompanied by open-source R functions (depicted at https://sri-human-sleep.github.io/sleep-trackers-performance/AnalyticalPipeline_v1.0.0.html). In addition, an empirical sample dataset is used to describe and discuss the main outcomes of the proposed pipeline. The guidelines and the accompanying functions are aimed at standardizing the testing of CSTs performance, to not only increase the replicability of validation studies, but also to provide ready-to-use tools to researchers and clinicians. All in all, this work can help to increase the efficiency, interpretation, and quality of validation studies, and to improve the informed adoption of CST in research and clinical settings.

Download Full-text