scholarly journals Data-Driven Software Development at Large Scale : from Ad-Hoc Data Collection to Trustworthy Experimentation

2018 ◽  
Author(s):  
Aleksander Fabijan
Author(s):  
Sandro Bimonte ◽  
Marilys Pradel ◽  
Daniel Boffety ◽  
Aurelie Tailleur ◽  
Géraldine André ◽  
...  

Agricultural energy consumption is an important environmental and social issue. Several diagnosis tools have been proposed to define indicators for analyzing the large-scale energy consumption of agricultural farm activities (year, farm, production activity, etc.). In Bimonte, Boulil, Chanet and Pradel (2012), the authors define (i) new appropriate indicators to analyze agricultural farm energy-use performance on a detailed scale and (ii) show how Spatial Data Warehouse (SDW) and Spatial OnLine Analytical Processing (SOLAP) GeoBusiness Intelligence (GeoBI) technologies can be used to represent, store, and analyze these indicators by simultaneously producing graphical and cartographic reports. These GeoBI technologies allow for the analysis of huge volumes of georeferenced data by providing aggregated numerical values visualized by means of interactive tabular, graphical, and cartographic displays. However, existing data collection systems based on sensors are not well adapted for agricultural data. In this paper, the authors show the global architecture of our GeoBI solution and highlight the data collection process based on agricultural ad hoc sensor networks, the associated transformation and cleaning operations performed by means of Spatial Extract Transform Load (ETL) tools, and a new implementation of the system using a web-services-based loosely coupled SOLAP architecture to provide interoperability and reusability of the complex multi-tier GeoBI architecture. Moreover, the authors detail how the energy-use diagnosis tool proposed in Bimonte, Boulil, Chanet and Pradel (2012) theoretically fits with the sensor data and the SOLAP approach.


2020 ◽  
Vol 7 (1) ◽  
pp. 205395172092809
Author(s):  
Taylor M Cruz

Large-scale data systems are increasingly envisioned as tools for justice, with big data analytics offering a key opportunity to advance health equity. Health systems face growing public pressure to collect data on patient “social factors,” and advocates and public officials seek to leverage such data sources as a means of system transformation. Despite the promise of this “data-driven” strategy, there is little empirical work that examines big data in action directly within the sites of care expected to transform. In this article, I present a case study on one such initiative, focusing on a large public safety-net health system’s initiation of sexual orientation and gender identity (SOGI) data collection within the clinical setting. Drawing from ethnographic fieldwork and in-depth interviews with providers, staff, and administrators, I highlight three main challenges that elude big data’s grasp on inequality: (1) provider and staff’s limited understanding of the social significance of data collection; (2) patient perception of the cultural insensitivity of data items; and (3) clinic need to balance data requests with competing priorities within a constrained time window. These issues reflect structural challenges within safety-net care that big data alone are unable to address in advancing social justice. I discuss these findings by considering the present data-driven strategy alongside two complementary courses of action: diversifying the health professions workforce and clinical education reform. To truly advance justice, we need more than “just data”: we need to confront the fundamental conditions of social inequality.


2021 ◽  
pp. 52-61
Author(s):  
Henrik Vedal ◽  
Viktoria Stray ◽  
Marthe Berntzen ◽  
Nils Brede Moe

AbstractDelivering results iteratively and frequently in large-scale agile requires efficient management of dependencies. We conducted semi-structured interviews and virtual observations in a large-scale project during the Covid-19 pandemic to better understand large-scale dependency management. All employees in the case were working from home. During our data collection and analysis, we identified 22 coordination mechanisms. These mechanisms could be categorized as synchronization activities, boundary-spanning activities and artifacts, and coordinator roles. By using a dependency taxonomy, we analyzed how the mechanisms managed five different types of dependencies. We discuss three essential mechanisms for coordination in our case. First, setting Objectives and Key Results (OKRs) in regular workshops increased transparency and predictability across teams. Second, ad-hoc communication, mainly happening on Slack because of the distributed setting, was essential in managing dependencies. Third, the Product Owner was a coordinator role that managed both inter-team and intra-team dependencies.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 154
Author(s):  
Marcus Walldén ◽  
Masao Okita ◽  
Fumihiko Ino ◽  
Dimitris Drikakis ◽  
Ioannis Kokkinakis

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


2021 ◽  
Vol 10 (1) ◽  
pp. e001087
Author(s):  
Tarek F Radwan ◽  
Yvette Agyako ◽  
Alireza Ettefaghian ◽  
Tahira Kamran ◽  
Omar Din ◽  
...  

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.


Forests ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 59
Author(s):  
Olivier Fradette ◽  
Charles Marty ◽  
Pascal Tremblay ◽  
Daniel Lord ◽  
Jean-François Boucher

Allometric equations use easily measurable biometric variables to determine the aboveground and belowground biomasses of trees. Equations produced for estimating the biomass within Canadian forests at a large scale have not yet been validated for eastern Canadian boreal open woodlands (OWs), where trees experience particular environmental conditions. In this study, we harvested 167 trees from seven boreal OWs in Quebec, Canada for biomass and allometric measurements. These data show that Canadian national equations accurately predict the whole aboveground biomass for both black spruce and jack pine trees, but underestimated branches biomass, possibly owing to a particular tree morphology in OWs relative to closed-canopy stands. We therefore developed ad hoc allometric equations based on three power models including diameter at breast height (DBH) alone or in combination with tree height (H) as allometric variables. Our results show that although the inclusion of H in the model yields better fits for most tree compartments in both species, the difference is minor and does not markedly affect biomass C stocks at the stand level. Using these newly developed equations, we found that carbon stocks in afforested OWs varied markedly among sites owing to differences in tree growth and species. Nine years after afforestation, jack pine plantations had accumulated about five times more carbon than black spruce plantations (0.14 vs. 0.80 t C·ha−1), highlighting the much larger potential of jack pine for OW afforestation projects in this environment.


Electronics ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 219
Author(s):  
Phuoc Duc Nguyen ◽  
Lok-won Kim

People nowadays are entering an era of rapid evolution due to the generation of massive amounts of data. Such information is produced with an enormous contribution from the use of billions of sensing devices equipped with in situ signal processing and communication capabilities which form wireless sensor networks (WSNs). As the number of small devices connected to the Internet is higher than 50 billion, the Internet of Things (IoT) devices focus on sensing accuracy, communication efficiency, and low power consumption because IoT device deployment is mainly for correct information acquisition, remote node accessing, and longer-term operation with lower battery changing requirements. Thus, recently, there have been rich activities for original research in these domains. Various sensors used by processing devices can be heterogeneous or homogeneous. Since the devices are primarily expected to operate independently in an autonomous manner, the abilities of connection, communication, and ambient energy scavenging play significant roles, especially in a large-scale deployment. This paper classifies wireless sensor nodes into two major categories based the types of the sensor array (heterogeneous/homogeneous). It also emphasizes on the utilization of ad hoc networking and energy harvesting mechanisms as a fundamental cornerstone to building a self-governing, sustainable, and perpetually-operated sensor system. We review systems representative of each category and depict trends in system development.


Author(s):  
Cody Minks ◽  
Anke Richter

AbstractObjectiveResponding to large-scale public health emergencies relies heavily on planning and collaboration between law enforcement and public health officials. This study examines the current level of information sharing and integration between these domains by measuring the inclusion of public health in the law enforcement functions of fusion centers.MethodsSurvey of all fusion centers, with a 29.9% response rate.ResultsOnly one of the 23 responding fusion centers had true public health inclusion, a decrease from research conducted in 2007. Information sharing is primarily limited to information flowing out of the fusion center, with little public health information coming in. Most of the collaboration is done on a personal, informal, ad-hoc basis. There remains a large misunderstanding of roles, capabilities, and regulations by all parties (fusion centers and public health). The majority of the parties appear to be willing to work together, but there but there is no forward momentum to make these desires a reality. Funding and staffing issues seem to be the limiting factor for integration.ConclusionThese problems need to be urgently addressed to increase public health preparedness and enable a decisive and beneficial response to public health emergencies involving a homeland security response.


Sign in / Sign up

Export Citation Format

Share Document