Neural data science: accelerating the experiment-analysis-theory cycle in large-scale neuroscience

10.1101/196949 ◽

2017 ◽

Cited By ~ 4

Author(s):

L Paninski ◽

J.P Cunningham

Keyword(s):

Science Fiction ◽

Large Scale ◽

Data Science ◽

Time Series Data ◽

Series Data ◽

Multielectrode Arrays ◽

Analysis Theory ◽

Broad Array ◽

Scientific Questions

AbstractModern large - scale multineuronal recording methodologies, including multielectrode arrays, calcium imaging, and optogenetic techniques, produce single - neuron resolution data of a magnitude and precision that were the realm of science fiction twenty years ago. The major bottlenecks in systems and circuit neuroscience no longer lie in simply collecting data from large neural populations, but also in understanding this data: developing novel scientific questions, with corresponding analysis techniques and experimental designs to fully harness these new capabilities and meaningfully interrogate these questions. Advances in methods for signal processing, network analysis, dimensionality reduction, and optimal control – developed in lockstep with advances in experimental neurotechnology - - promise major breakthroughs in multiple fundamental neuroscience problems. These trends are clear in a broad array of subfields of modern neuroscience; this review focuses on recent advances in methods for analyzing neural time - series data with single - neuronal precision. Figure 1.The central role of data science in modern large - scale neuroscience.Topics reviewed herein are indicated in black.

Download Full-text

When didactics meet data science: process data analysis in large-scale mathematics assessment in France

Large-scale Assessments in Education ◽

10.1186/s40536-020-00085-y ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Franck Salles ◽

Reinaldo Dos Santos ◽

Saskia Keskpaik

Keyword(s):

Data Analysis ◽

Large Scale ◽

Data Science ◽

Mathematics Assessment ◽

Process Data ◽

Meet Data

Download Full-text

Collective Development of Large Scale Data Science Products via Modularized Assignments

Proceedings of the 51st ACM Technical Symposium on Computer Science Education ◽

10.1145/3328778.3366961 ◽

2020 ◽

Author(s):

Bhavya ◽

Assma Boughoula ◽

Aaron Green ◽

ChengXiang Zhai

Keyword(s):

Large Scale ◽

Data Science ◽

Large Scale Data ◽

Collective Development ◽

Scale Data

Download Full-text

Benchmarking driving efficiency using data science techniques applied on large - scale smartphone data

10.12681/eadd/44854 ◽

2018 ◽

Author(s):

Δημήτριος Τσελέντης

Keyword(s):

Data Envelopment Analysis ◽

Convex Hull ◽

Large Scale ◽

Data Science ◽

Data Envelopment ◽

Using Data

Ο κύριος στόχος της παρούσας διδακτορικής διατριβής είναι η ανάπτυξη μιας ολοκληρωμένης μεθοδολογικής προσέγγισης για τη συγκριτική αξιολόγηση της οδηγικής επίδοσης, όσον αφορά την οδική ασφάλεια, τόσο σε επίπεδο διαδρομής, όσο και οδηγού, με τη χρήση τεχνικών της επιστήμης δεδομένων. Η μεθοδολογική προσέγγιση στηρίζεται στον καθορισμό ενός δείκτη επίδοσης που βασίζεται στη θεωρία της Περιβάλλουσας Ανάλυσης Δεδομένων (Data Envelopment Analysis - DEA) και σχετίζεται με μακροσκοπικά συμπεριφοριστικά χαρακτηριστικά οδήγησης, όπως ο αριθμός των απότομων επιταχύνσεων/ επιβραδύνσεων, ο χρόνος χρήσης του κινητού τηλεφώνου και ο χρόνος υπέρβασης του ορίου ταχύτητας. Ακόμα, αναπτύσσονται μοντέλα μηχανικής μάθησης για τον προσδιορισμό διακριτών προφίλ οδήγησης που βασίζονται στη χρονική εξέλιξη της οδηγικής επίδοσης. Η προτεινόμενη μεθοδολογική προσέγγιση εφαρμόζεται σε πραγματικά δεδομένα οδήγησης ευρείας κλίμακας που συλλέγονται από έξυπνες συσκευές κινητών τηλεφώνων (smartphones), τα οποία αναλύονται μέσω στατιστικών μεθόδων για τον προσδιορισμό της απαιτούμενης ποσότητας δεδομένων οδήγησης που θα χρησιμοποιηθούν στην ανάλυση. Τα αποτελέσματα δείχνουν ότι ο βελτιστοποιημένος αλγόριθμος convex hull – DEA δίνει εξίσου ακριβή και ταχύτερα αποτελέσματα σε σχέση με τις κλασικές προσεγγίσεις της DEA. Ακόμα, η μεθοδολογία επιτρέπει τον προσδιορισμό των λιγότερο αποδοτικών ταξιδιών σε μια βάση δεδομένων καθώς και το αποδοτικό επίπεδο οδηγικών στοιχείων ενός ταξιδιού για να καταστεί αποδοτικότερη από την άποψη της ασφάλειας. Η περαιτέρω ομάδοποίηση των οδηγών με βάση της απόδοσή τους σε βάθος χρόνου οδηγεί στον εντοπισμό τριών ομάδων οδηγών, αυτή του μέσου οδηγού, του ασταθή οδηγού και του λιγότερο επικίνδυνου οδηγού. Τα αποτελέσματα δείχνουν ότι η εκ των προτέρων γνώση σχετικά με το ιστορικό ατυχημάτων του χρήστη φαίνεται να επηρεάζουν μόνο τη σύσταση της δεύτερης συστάδας των πιο ασταθών οδηγών, η οποία ενσωματώνει τους οδηγούς που είναι λιγότερο αποδοτικοί και ασταθής ως προς την ασφάλεια. Φαίνεται επίσης ότι η χρήση κινητών τηλεφώνων δεν αποτελεί κρίσιμο παράγοντα για τον καθορισμό της επίδοσης της ασφάλειας ενός οδηγού, καθώς διαπιστώθηκαν μικρές διαφορές σε σχέση με αυτό το χαρακτηριστική οδήγησης μεταξύ οδηγών διαφορετικών κατηγοριών επίδοσης. Επιπλέον, δείχνεται ότι απαιτείται μια διαφορετική δειγματοληψίας δεδομένων οδήγησης για κάθε α) οδικό τύπο, β) χαρακτηριστικό οδήγησης και γ) οδηγική επιθετικότητα για να συγκεντρωθούν αρκετά δεδομένα και να αποκτηθεί μια σαφής εικόνα της οδηγικής συμπεριφοράς και να εκτελεστεί ανάλυση με χρήση DEA. Τα αποτελέσματα θα μπορούσαν να αξιοποιηθούν για την παροχή εξατομικευμένης ανατροφοδότησης στους οδηγούς σχετικά με τη συνολική τους οδηγική επίδοση και την εξέλιξή της, προκειμένου να βελτιωθεί και να μειωθεί ο κίνδυνος ατυχήματος.

Download Full-text

High-Performance Computing Framework Based on Distributed Systems for Large-Scale Neurophysiological Data

10.21203/rs.3.rs-136986/v1 ◽

2021 ◽

Author(s):

Mohsen Hadianpour ◽

Ehsan Rezayat ◽

Mohammad-Reza Dehaqani

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Electrophysiological Recording ◽

Neural Data ◽

Data Framework ◽

Neurophysiological Data ◽

Computing Framework ◽

Performance Computing ◽

Neuroscience Community

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.

Download Full-text

Feasibility and Evaluation of a Large-Scale External Validation Approach for Patient-Level Prediction in an International Data Network: Validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation.

10.21203/rs.2.11750/v2 ◽

2020 ◽

Author(s):

Jenna Marie Reps ◽

Ross Williams ◽

Seng Chan You ◽

Thomas Falconer ◽

Evan Minty ◽

...

Keyword(s):

Atrial Fibrillation ◽

Large Scale ◽

Data Science ◽

Prediction Models ◽

External Validation ◽

Scale Up ◽

R Package ◽

Prognostic Models ◽

Healthcare Data ◽

Patient Level

Abstract Objective: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Materials & Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Discussion: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. Conclusion : In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.

Download Full-text

Feasibility and Evaluation of a Large-Scale External Validation Approach for Patient-Level Prediction in an International Data Network: Validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation.

10.21203/rs.2.11750/v3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jenna Marie Reps ◽

Ross D Williams ◽

Seng Chan You ◽

Thomas Falconer ◽

Evan Minty ◽

...

Keyword(s):

Atrial Fibrillation ◽

Large Scale ◽

Data Science ◽

Prediction Models ◽

External Validation ◽

Scale Up ◽

R Package ◽

Prognostic Models ◽

Healthcare Data ◽

Patient Level

Abstract Background: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Conclusion : This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Web Services ◽

10.4018/978-1-5225-7501-6.ch052 ◽

2019 ◽

pp. 953-978

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch002 ◽

2017 ◽

pp. 14-40 ◽

Cited By ~ 2

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Changes in Gender Stereotypes Over Time: A Computational Analysis

Psychology of Women Quarterly ◽

10.1177/0361684320977178 ◽

2020 ◽

pp. 036168432097717

Author(s):

Nazlı Bhatia ◽

Sudeep Bhatia

Keyword(s):

Gender Stereotypes ◽

Large Scale ◽

Data Science ◽

Historical Analysis ◽

Psychological Variables ◽

Psychological Measures ◽

Gender Biases ◽

Language Data ◽

Physical Traits ◽

Over Time

We combined established psychological measures with techniques in machine learning to measure changes in gender stereotypes over the course of the 20th century as expressed in large-scale historical natural language data. Although our analysis replicated robust gender biases previously documented in the literature, we found that the strength of these biases has diminished over time. This appears to be driven by changes in gender biases for stereotypically feminine traits (rather than stereotypically masculine traits) and changes in gender biases for personality-related traits (rather than physical traits). Our results illustrate the dynamic nature of stereotypes and show how recent advances in data science can be used to provide a long-term historical analysis of core psychological variables. In terms of practice, these findings may, albeit cautiously, suggest that women and men can be less constrained by prescriptions of feminine traits. Additional online materials for this article are available on PWQ’s website at 10.1177/0361684320977178

Download Full-text