OPAL: High performance platform for large-scale privacy-preserving location data analytics

Author(s):  
Axel Oehmichen ◽  
Shubham Jain ◽  
Andrea Gadotti ◽  
Yves-Alexandre de Montjoye
Big Data ◽  
2016 ◽  
pp. 1555-1581
Author(s):  
Gueyoung Jung ◽  
Tridib Mukherjee

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.


Author(s):  
Mirco Nanni ◽  
Gennady Andrienko ◽  
Albert-László Barabási ◽  
Chiara Boldrini ◽  
Francesco Bonchi ◽  
...  

AbstractThe rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.


Author(s):  
Gueyoung Jung ◽  
Tridib Mukherjee

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.


2019 ◽  
Vol 2019 (4) ◽  
pp. 93-111 ◽  
Author(s):  
Elena Pagnin ◽  
Gunnar Gunnarsson ◽  
Pedram Talebi ◽  
Claudio Orlandi ◽  
Andrei Sabelfeld

Abstract Ridesharing is revolutionizing the transportation industry in many countries. Yet, the state of the art is based on heavily centralized services and platforms, where the service providers have full possession of the users’ location data. Recently, researchers have started addressing the challenge of enabling privacy-preserving ridesharing. The initial proposals, however, have shortcomings, as some rely on a central party, some incur high performance penalties, and most do not consider time preferences for ridesharing. TOPPool encompasses ridesharing based on the proximity of end-points of a ride as well as partial itinerary overlaps. To achieve the latter, we propose a simple yet powerful reduction to a private set intersection on trips represented as sets of consecutive road segments. We show that TOPPool includes time preferences while preserving privacy and without relying on a third party. We evaluate our approach on real-world data from the New York’s Taxi & Limousine Commission. Our experiments demonstrate that TOPPool is superior in performance over the prior work: our intersection-based itinerary matching runs in less than 0.3 seconds for reasonable trip length, in contrast, on the same set of trips prior work takes up to 10 hours.


2020 ◽  
Vol 17 (173) ◽  
pp. 20200344
Author(s):  
Chenfeng Xiong ◽  
Songhua Hu ◽  
Mofeng Yang ◽  
Hannah Younes ◽  
Weiyu Luo ◽  
...  

One approach to delaying the spread of the novel coronavirus (COVID-19) is to reduce human travel by imposing travel restriction policies. Understanding the actual human mobility response to such policies remains a challenge owing to the lack of an observed and large-scale dataset describing human mobility during the pandemic. This study uses an integrated dataset, consisting of anonymized and privacy-protected location data from over 150 million monthly active samples in the USA, COVID-19 case data and census population information, to uncover mobility changes during COVID-19 and under the stay-at-home state orders in the USA. The study successfully quantifies human mobility responses with three important metrics: daily average number of trips per person; daily average person-miles travelled; and daily percentage of residents staying at home. The data analytics reveal a spontaneous mobility reduction that occurred regardless of government actions and a ‘floor’ phenomenon, where human mobility reached a lower bound and stopped decreasing soon after each state announced the stay-at-home order. A set of longitudinal models is then developed and confirms that the states' stay-at-home policies have only led to about a 5% reduction in average daily human mobility. Lessons learned from the data analytics and longitudinal models offer valuable insights for government actions in preparation for another COVID-19 surge or another virus outbreak in the future.


Author(s):  
C.K. Wu ◽  
P. Chang ◽  
N. Godinho

Recently, the use of refractory metal silicides as low resistivity, high temperature and high oxidation resistance gate materials in large scale integrated circuits (LSI) has become an important approach in advanced MOS process development (1). This research is a systematic study on the structure and properties of molybdenum silicide thin film and its applicability to high performance LSI fabrication.


Author(s):  
В.В. ГОРДЕЕВ ◽  
В.Е. ХАЗАНОВ

При выборе типа доильной установки и ее размера необходимо учитывать максимальное планируемое поголовье дойных коров и размер технологической группы, кратность и время одного доения, продолжительность рабочей смены дояров. Анализ технико-экономических показателей наиболее распространенных на сегодняшний день типов доильных установок одинакового технического уровня свидетельствует, что наилучшие удельные показатели имеет установка типа «Карусель» (1), а установка типа «Елочка» (2) требует более высоких затрат труда и средств. Установка «Параллель» (3) занимает промежуточное положение. Из анализа пропускной способности и количества необходимых операторов: установка 2 рекомендована для ферм с поголовьем дойного стада до 600 голов, 3 — не более 1200 дойных коров, 1 — более 1200 дойных коров. «Карусель» — наиболее рациональный, высокопроизводительный, легко автоматизируемый и, следовательно, перспективный способ доения в залах, особенно для крупных молочных ферм. The choice of the proper type and size of milking installations needs to take into account the maximum planned number of dairy cows, the size of a technological group, the number of milkings per day, and the duration of one milking and the operator's working shift. The analysis of technical and economic indicators of currently most common types of milking machines of the same technical level revealed that the Carousel installation had the best specific indicators while the Herringbone installation featured higher labour inputs and cash costs. The Parallel installation was found somewhere in between. In terms of the throughput and the required number of operators Herringbone is recommended for farms with up to 600 dairy cows, Parallel — below 1200 dairy cows, Carousel — above 1200 dairy cows. Carousel was found the most practical, high-performance, easily automated and, therefore, promising milking system for milking parlours, especially on the large-scale dairy farms.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Radiation ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 79-94
Author(s):  
Peter K. Rogan ◽  
Eliseos J. Mucaki ◽  
Ben C. Shirley ◽  
Yanxin Li ◽  
Ruth C. Wilkins ◽  
...  

The dicentric chromosome (DC) assay accurately quantifies exposure to radiation; however, manual and semi-automated assignment of DCs has limited its use for a potential large-scale radiation incident. The Automated Dicentric Chromosome Identifier and Dose Estimator (ADCI) software automates unattended DC detection and determines radiation exposures, fulfilling IAEA criteria for triage biodosimetry. This study evaluates the throughput of high-performance ADCI (ADCI-HT) to stratify exposures of populations in 15 simulated population scale radiation exposures. ADCI-HT streamlines dose estimation using a supercomputer by optimal hierarchical scheduling of DC detection for varying numbers of samples and metaphase cell images in parallel on multiple processors. We evaluated processing times and accuracy of estimated exposures across census-defined populations. Image processing of 1744 samples on 16,384 CPUs required 1 h 11 min 23 s and radiation dose estimation based on DC frequencies required 32 sec. Processing of 40,000 samples at 10 exposures from five laboratories required 25 h and met IAEA criteria (dose estimates were within 0.5 Gy; median = 0.07). Geostatistically interpolated radiation exposure contours of simulated nuclear incidents were defined by samples exposed to clinically relevant exposure levels (1 and 2 Gy). Analysis of all exposed individuals with ADCI-HT required 0.6–7.4 days, depending on the population density of the simulation.


Antioxidants ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 843
Author(s):  
Tamara Ortiz ◽  
Federico Argüelles-Arias ◽  
Belén Begines ◽  
Josefa-María García-Montes ◽  
Alejandra Pereira ◽  
...  

The best conservation method for native Chilean berries has been investigated in combination with an implemented large-scale extract of maqui berry, rich in total polyphenols and anthocyanin to be tested in intestinal epithelial and immune cells. The methanolic extract was obtained from lyophilized and analyzed maqui berries using Folin–Ciocalteu to quantify the total polyphenol content, as well as 2,2-diphenyl-1-picrylhydrazyl (DPPH), ferric reducing antioxidant power (FRAP), and oxygen radical absorbance capacity (ORAC) to measure the antioxidant capacity. Determination of maqui’s anthocyanins profile was performed by ultra-high-performance liquid chromatography (UHPLC-MS/MS). Viability, cytotoxicity, and percent oxidation in epithelial colon cells (HT-29) and macrophages cells (RAW 264.7) were evaluated. In conclusion, preservation studies confirmed that the maqui properties and composition in fresh or frozen conditions are preserved and a more efficient and convenient extraction methodology was achieved. In vitro studies of epithelial cells have shown that this extract has a powerful antioxidant strength exhibiting a dose-dependent behavior. When lipopolysaccharide (LPS)-macrophages were activated, noncytotoxic effects were observed, and a relationship between oxidative stress and inflammation response was demonstrated. The maqui extract along with 5-aminosalicylic acid (5-ASA) have a synergistic effect. All of the compiled data pointed out to the use of this extract as a potential nutraceutical agent with physiological benefits for the treatment of inflammatory bowel disease (IBD).


Sign in / Sign up

Export Citation Format

Share Document