MAGICPL: A Generic Process Description Language for Distributed Pseudonymization Scenarios

Abstract Galactic electron density distribution models are crucial tools for estimating the impact of the ionised interstellar medium on the impulsive signals from radio pulsars and fast radio bursts. The two prevailing Galactic electron density models (GEDMs) are YMW16 (Yao et al. 2017, ApJ, 835, 29) and NE2001 (Cordes & Lazio 2002, arXiv e-prints, pp astro–ph/0207156). Here, we introduce a software package PyGEDM which provides a unified application programming interface for these models and the YT20 (Yamasaki & Totani 2020, ApJ, 888, 105) model of the Galactic halo. We use PyGEDM to compute all-sky maps of Galactic dispersion measure (DM) for YMW16 and NE2001 and compare the large-scale differences between the two. In general, YMW16 predicts higher DM values towards the Galactic anticentre. YMW16 predicts higher DMs at low Galactic latitudes, but NE2001 predicts higher DMs in most other directions. We identify lines of sight for which the models are most discrepant, using pulsars with independent distance measurements. YMW16 performs better on average than NE2001, but both models show significant outliers. We suggest that future campaigns to determine pulsar distances should focus on targets where the models show large discrepancies, so future models can use those measurements to better estimate distances along those line of sight. We also suggest that the Galactic halo should be considered as a component in future GEDMs, to avoid overestimating the Galactic DM contribution for extragalactic sources such as FRBs.

Download Full-text

GeNePy3D: a quantitative geometry python toolbox for large scale bioimaging

F1000Research ◽

10.12688/f1000research.27395.1 ◽

2020 ◽

Vol 9 ◽

pp. 1374 ◽

Cited By ~ 1

Author(s):

Minh-Son Phan ◽

Anatole Chessel

Keyword(s):

Large Scale ◽

Application Programming Interface ◽

Scale Space ◽

Experimental Conditions ◽

Computational Tools ◽

3D Geometry ◽

Application Programming ◽

Reusable Containers ◽

Microscopy Techniques ◽

Programming Interface

The advent of large-scale fluorescence and electronic microscopy techniques along with maturing image analysis is giving life sciences a deluge of geometrical objects in 2D/3D(+t) to deal with. These objects take the form of large scale, localised, precise, single cell, quantitative data such as cells’ positions, shapes, trajectories or lineages, axon traces in whole brains atlases or varied intracellular protein localisations, often in multiple experimental conditions. The data mining of those geometrical objects requires a variety of mathematical and computational tools of diverse accessibility and complexity. Here we present a new Python library for quantitative 3D geometry called GeNePy3D which helps handle and mine information and knowledge from geometric data, providing a unified application programming interface (API) to methods from several domains including computational geometry, scale space methods or spatial statistics. By framing this library as generically as possible, and by linking it to as many state-of-the-art reference algorithms and projects as needed, we help render those often specialist methods accessible to a larger community. We exemplify the usefulness of the GeNePy3D toolbox by re-analysing a recently published whole-brain zebrafish neuronal atlas, with other applications and examples available online. Along with an open source, documented and exemplified code, we release reusable containers to allow for convenient and wide usability and increased reproducibility.

Download Full-text

Innovative management of accelerometry, inertial, acoustic, and satellite data using netCDF and Postgres

10.7287/peerj.preprints.26731 ◽

2018 ◽

Author(s):

Alex Nunes ◽

Damian Lidgard ◽

Franziska Broell

Keyword(s):

High Resolution ◽

Satellite Data ◽

Large Scale ◽

Cluster Computing ◽

Movement Ecology ◽

Application Programming Interface ◽

Easy Access ◽

Application Programming ◽

Original Size ◽

User Friendly

In 2015, as part of the Ocean Tracking Network’s bioprobe initiative, 20 grey seals (Halichoerus grypus) were tagged with a high-resolution (> 30 Hz) inertial tags (> 30 Hz), a depth-temperature satellite tag (0.1 Hz), and an acoustic transceiver on Sable Island for 6 months. Comparable to similar large-scale studies in movement ecology, the unprecedented size of the data (gigabytes for a single seal) collected by these instruments raises new challenges in efficient database management. Here we propose the utility of Postgres and netCDF for storing the biotelemetry data and associated metadata. While it was possible to write the lower-resolution (acoustic and satellite) data to a Postgres database, netCDF was chosen as the format for the high-resolution movement (acceleration and inertial) records. Even without access to cluster computing, data could be efficiently (CPU time) recorded, as 920 million records were written in < 3 hours. ERDDAP was used to access and link the different datastreams with a user-friendly Application Programming Interface. This approach compresses the data to a fifth of its original size, and storing the data in a tree-like structure enables easy access and visualization for the end user.

Download Full-text

Innovative management of accelerometry, inertial, acoustic, and satellite data using netCDF and Postgres

10.7287/peerj.preprints.26731v1 ◽

2018 ◽

Author(s):

Alex Nunes ◽

Damian Lidgard ◽

Franziska Broell

Keyword(s):

High Resolution ◽

Satellite Data ◽

Large Scale ◽

Cluster Computing ◽

Movement Ecology ◽

Application Programming Interface ◽

Easy Access ◽

Application Programming ◽

Original Size ◽

User Friendly

In 2015, as part of the Ocean Tracking Network’s bioprobe initiative, 20 grey seals (Halichoerus grypus) were tagged with a high-resolution (> 30 Hz) inertial tags (> 30 Hz), a depth-temperature satellite tag (0.1 Hz), and an acoustic transceiver on Sable Island for 6 months. Comparable to similar large-scale studies in movement ecology, the unprecedented size of the data (gigabytes for a single seal) collected by these instruments raises new challenges in efficient database management. Here we propose the utility of Postgres and netCDF for storing the biotelemetry data and associated metadata. While it was possible to write the lower-resolution (acoustic and satellite) data to a Postgres database, netCDF was chosen as the format for the high-resolution movement (acceleration and inertial) records. Even without access to cluster computing, data could be efficiently (CPU time) recorded, as 920 million records were written in < 3 hours. ERDDAP was used to access and link the different datastreams with a user-friendly Application Programming Interface. This approach compresses the data to a fifth of its original size, and storing the data in a tree-like structure enables easy access and visualization for the end user.

Download Full-text

Tibanna: software for scalable execution of portable pipelines on the cloud

10.1101/440974 ◽

2018 ◽

Cited By ~ 1

Author(s):

Soohyun Lee ◽

Jeremy Johnson ◽

Carl Vitzthum ◽

Koray Kırlı ◽

Burak H. Alver ◽

...

Keyword(s):

Web Services ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Software Tool ◽

Application Programming Interface ◽

Description Language ◽

Amazon Web Services ◽

Application Programming ◽

Programming Interface

AbstractSummaryWe introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network.AvailabilitySource code is available on GitHub at https://github.com/4dn-dcic/tibanna.

Download Full-text

Distant reading 940,000 online circulations of 26 iconic photographs

New Media & Society ◽

10.1177/14614448211049459 ◽

2021 ◽

pp. 146144482110494

Author(s):

Thomas Smits ◽

Ruben Ros

Keyword(s):

Digital Media ◽

Language Processing ◽

Large Scale ◽

Application Programming Interface ◽

Processing Technique ◽

Distant Reading ◽

Media Impact ◽

Large Scale Analysis ◽

Iconic Images ◽

Application Programming

How do digital media impact the meaning of iconic photographs? Recent studies have suggested that online circulation, especially in a memeified form, might lead to the erosion, fracturing, or collapsing of the original contextual meaning of iconic pictures. Introducing a distant reading methodology to the study of iconic photographs, we apply the Google Cloud Vision Application Programming Interface (GCV API) to retrieve 940,000 online circulations of 26 iconic images between 1995 and 2020. We use document embeddings, a Natural Language Processing technique, to map in what contexts iconic photographs are circulated online. The article demonstrates that constantly changing configurations of contextual imagetexts, self-referential image-texts, and non-referential image/texts shape the online live of iconic photographs: ebbs and flows of slowly disappearing, suddenly resurfacing, and newly found meanings. While iconic photographs might not need captions to speak, this article argues that a large-scale analysis of texts can help us better grasp what they say.

Download Full-text

Chinese Text Project: a dynamic digital library of premodern Chinese

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz046 ◽

2019 ◽

Cited By ~ 1

Author(s):

Donald Sturgeon

Keyword(s):

Digital Library ◽

Chinese Text ◽

Character Recognition ◽

Optical Character Recognition ◽

Large Scale ◽

Application Programming Interface ◽

Transcription System ◽

Application Programming ◽

Chinese Writing ◽

Programming Interface

Abstract This article presents technical approaches and innovations in digital library design developed during the design and implementation of the Chinese Text Project, a widely-used, large-scale full-text digital library of premodern Chinese writing. By leveraging a combination of domain-optimized Optical Character Recognition, a purpose-designed crowdsourcing system, and an Application Programming Interface (API), this project simultaneously provides a sustainable transcription system, search interface and reading environment, as well as an extensible platform for transcribing and working with premodern Chinese textual materials. By means of the API, intentionally loosely integrated text mining tools are used to extend the platform, while also being reusable independently with materials from other sources and in other languages.

Download Full-text

GeNePy3D: a quantitative geometry python toolbox for bioimaging

F1000Research ◽

10.12688/f1000research.27395.2 ◽

2021 ◽

Vol 9 ◽

pp. 1374

Author(s):

Minh-Son Phan ◽

Anatole Chessel

Keyword(s):

Large Scale ◽

Application Programming Interface ◽

Scale Space ◽

Experimental Conditions ◽

Computational Tools ◽

3D Geometry ◽

Application Programming ◽

Reusable Containers ◽

Microscopy Techniques ◽

Programming Interface

The advent of large-scale fluorescence and electronic microscopy techniques along with maturing image analysis is giving life sciences a deluge of geometrical objects in 2D/3D(+t) to deal with. These objects take the form of large scale, localised, precise, single cell, quantitative data such as cells’ positions, shapes, trajectories or lineages, axon traces in whole brains atlases or varied intracellular protein localisations, often in multiple experimental conditions. The data mining of those geometrical objects requires a variety of mathematical and computational tools of diverse accessibility and complexity. Here we present a new Python library for quantitative 3D geometry called GeNePy3D which helps handle and mine information and knowledge from geometric data, providing a unified application programming interface (API) to methods from several domains including computational geometry, scale space methods or spatial statistics. By framing this library as generically as possible, and by linking it to as many state-of-the-art reference algorithms and projects as needed, we help render those often specialist methods accessible to a larger community. We exemplify the usefulness of the GeNePy3D toolbox by re-analysing a recently published whole-brain zebrafish neuronal atlas, with other applications and examples available online. Along with an open source, documented and exemplified code, we release reusable containers to allow for convenient and wide usability and increased reproducibility.

Download Full-text

ViReader: A wikipedia-based Vietnamese reading comprehension system using transfer learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210683 ◽

2021 ◽

pp. 1-19

Author(s):

Kiet Van Nguyen ◽

Nhat Duy Nguyen ◽

Phong Nguyen-Thuan Do ◽

Anh Gia-Tuan Nguyen ◽

Ngan Luu-Thuy Nguyen

Keyword(s):

Reading Comprehension ◽

Transfer Learning ◽

Large Scale ◽

State Of The Art ◽

Application Programming Interface ◽

Multiple Datasets ◽

Application Programming ◽

Interest In Research ◽

The Given ◽

Machine Reading

Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54% , respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.

Download Full-text

Scraping and Analysing YouTube Trending Videos for BI

10.3233/apc210099 ◽

2021 ◽

Author(s):

Sowmiya K ◽

Supriya S ◽

R. Subhashini

Keyword(s):

Large Scale ◽

Application Programming Interface ◽

Unstructured Data ◽

Huge Amount ◽

Channel Process ◽

The Past ◽

Data Process ◽

Application Programming ◽

Video Format ◽

Programming Interface

Analysis of structured data has seen tremendous success in the past. However, large scale of an unstructured data have been analysed in the form of video format remains a challenging area. YouTube, a Google company, has over a billion users and it used to generate billions of views. Since YouTube data is getting created in a very huge amount with a lot of views and with an equally great speed, there is a huge demand to store the data, process the data and carefully study the data in this large amount of it usable. The project utilizes the YouTube Data API (Application Programming Interface) which allows the applications or websites to incorporate functions in which they are used by YouTube application to fetch and view the information. The Google Developers Console which is used to generate an unique access key which is further required to fetch the data from YouTube public channel. Process the data and finally data stored in AWS. This project extracts the meaningful output of which can be used by the management for analysis, these methodologies helpful for business intelligence.

Download Full-text