scholarly journals Hadoop vs. Spark: Impact on Performance of the Hammer Query Engine for Open Data Corpora

Algorithms ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 209 ◽  
Author(s):  
Mauro Pelucchi ◽  
Giuseppe Psaila ◽  
Maurizio Toccu

The Hammer prototype is a query engine for corpora of Open Data that provides users with the concept of blind querying. Since data sets published on Open Data portals are heterogeneous, users wishing to find out interesting data sets are blind: queries cannot be fully specified, as in the case of databases. Consequently, the query engine is responsible for rewriting and adapting the blind query to the actual data sets, by exploiting lexical and semantic similarity. The effectiveness of this approach was discussed in our previous works. In this paper, we report our experience in developing the query engine. In fact, in the very first version of the prototype, we realized that the implementation of the retrieval technique was too slow, even though corpora contained only a few thousands of data sets. We decided to adopt the Map-Reduce paradigm, in order to parallelize the query engine and improve performances. We passed through several versions of the query engine, either based on the Hadoop framework or on the Spark framework. Hadoop and Spark are two very popular frameworks for writing and executing parallel algorithms based on the Map-Reduce paradigm. In this paper, we present our study about the impact of adopting the Map-Reduce approach and its two most famous frameworks to parallelize the Hammer query engine; we discuss various implementations of the query engine, either obtained without significantly rewriting the algorithm or obtained by completely rewriting the algorithm by exploiting high level abstractions provided by Spark. The experimental campaign we performed shows the benefits provided by each studied solution, with the perspective of moving toward Big Data in the future. The lessons we learned are collected and synthesized into behavioral guidelines for developers approaching the problem of parallelizing algorithms by means of Map-Reduce frameworks.


2017 ◽  
Author(s):  
Federica Rosetta

Watch the VIDEO here.Within the Open Science discussions, the current call for “reproducibility” comes from the raising awareness that results as presented in research papers are not as easily reproducible as expected, or even contradicted those original results in some reproduction efforts. In this context, transparency and openness are seen as key components to facilitate good scientific practices, as well as scientific discovery. As a result, many funding agencies now require the deposit of research data sets, institutions improve the training on the application of statistical methods, and journals begin to mandate a high level of detail on the methods and materials used. How can researchers be supported and encouraged to provide that level of transparency? An important component is the underlying research data, which is currently often only partly available within the article. At Elsevier we have therefore been working on journal data guidelines which clearly explain to researchers when and how they are expected to make their research data available. Simultaneously, we have also developed the corresponding infrastructure to make it as easy as possible for researchers to share their data in a way that is appropriate in their field. To ensure researchers get credit for the work they do on managing and sharing data, all our journals support data citation in line with the FORCE11 data citation principles – a key step in the direction of ensuring that we address the lack of credits and incentives which emerged from the Open Data analysis (Open Data - the Researcher Perspective https://www.elsevier.com/about/open-science/research-data/open-data-report ) recently carried out by Elsevier together with CWTS. Finally, the presentation will also touch upon a number of initiatives to ensure the reproducibility of software, protocols and methods. With STAR methods, for instance, methods are submitted in a Structured, Transparent, Accessible Reporting format; this approach promotes rigor and robustness, and makes reporting easier for the author and replication easier for the reader.



2014 ◽  
Vol 33 (2) ◽  
pp. 128-129 ◽  
Author(s):  
Matt Hall

Welcome to this new column. Every two months, a geoscientist will present a brief exploration of a geophysical topic. The idea is to take a tour bus around a subject and point out some of the sights, perhaps stopping briefly at an exemplary problem or instructive viewpoint. So far it's useful, but maybe not remarkable. The remarkable thing, I hope, is that the tour will be open access. The tutors will use only open data sets that anyone can download. There will be no proprietary software. I will strongly encourage the use of Octave, R, or Python, all high-level (that is, easy-to-learn) programming languages for scientists, and the important parts of the code will be shared. I've tried to give a flavor of all this in today's tutorial, using Python. If you are new to Python, IPython is a great place to start—visit ipython.org/install .



Hearts ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 410-418
Author(s):  
Brian Young ◽  
Johann-Jakob Schmid

Updates to industry consensus standards for ECG equipment is a work-in-progress by the ISO/IEC Joint Work Group 22. This work will result in an overhaul of existing industry standards that apply to ECG electromedical equipment and will result in a new single international industry, namely 80601-2-86. The new standard will be entitled “80601, Part 2-86: Particular requirements for the basic safety and essential performance of electrocardiographs, including diagnostic equipment, monitoring equipment, ambulatory equipment, electrodes, cables, and leadwires”. This paper will provide a high-level overview of the work in progress and, in particular, will describe the impact it will have on requirements and testing methods for computerized ECG interpretation algorithms. The conclusion of this work is that manufacturers should continue working with clinical ECG experts to make clinically meaningful improvements to automated ECG interpretation, and the clinical validation of ECG analysis algorithms should be disclosed to guide appropriate clinical use. More cooperation is needed between industry, clinical ECG experts and regulatory agencies to develop new data sets that can be made available for use by industry standards for algorithm performance evaluation.



2019 ◽  
Vol 37 (2) ◽  
pp. 95-101 ◽  
Author(s):  
Brad Keogh ◽  
Thomas Monks

ObjectivesThere have been claims that Delayed Transfers of Care (DTOCs) of inpatients to home or a less acute setting are related to Emergency Department (ED) crowding. In particular DTOCs were associated with breaches of the UK 4-hour waiting time target in a previously published analysis. However, the analysis has major limitations by not adjusting for the longitudinal trend of the data. The aim of this work is to investigate whether the proposition that DTOCs impact the 4-hour target requires further research.MethodEstimation of an association between two or more variables that are measured over time requires specialised statistical methods. In this study, we performed two separate analyses. First, we created two sets of artificial data with no correlation. We then added an upward trend over time and again assessed for correlation. Second, we reproduced the simple linear regression of the original study using NHS England open data of English trusts between 2010 and 2016, assessing correlation of numbers of DTOCs and ED breaches of the 4-hour target. We then reanalysed the same data using standard time series methods to remove the trend before estimating an association.ResultsAfter introducing upward trends into the uncorrelated artificial data the correlation between the two data sets increased (R2=0.00 to 0.51 respectively). We found strong evidence of longitudinal trends within the NHS data of ED breaches and DTOCs. After removal of the trends the R2 reduced from 0.50 to 0.01.ConclusionOur reanalysis found weak correlation between numbers of DTOCs and ED 4-hour target breaches. Our study does not indicate that there is no relationship between 4-hour target and DTOCs, it highlights that statistically robust evidence for this relationship does not currently exist. Further work is required to understand the relationship between breaches of the 4-hour target and numbers of DTOCs.



Author(s):  
V. Kovpak ◽  
N. Trotsenko

<div><p><em>The article analyzes the peculiarities of the format of native advertising in the media space, its pragmatic potential (in particular, on the example of native content in the social network Facebook by the brand of the journalism department of ZNU), highlights the types and trends of native advertising. The following research methods were used to achieve the purpose of intelligence: descriptive (content content, including various examples), comparative (content presentation options) and typological (types, trends of native advertising, in particular, cross-media as an opportunity to submit content in different formats (video, audio, photos, text, infographics, etc.)), content analysis method using Internet services (using Popsters service). And the native code for analytics was the page of the journalism department of Zaporizhzhya National University on the social network Facebook. After all, the brand of the journalism department of Zaporozhye National University in 2019 celebrates its 15th anniversary. The brand vector is its value component and professional training with balanced distribution of theoretical and practical blocks (seven practices), student-centered (democratic interaction and high-level teacher-student dialogue) and integration into Ukrainian and world educational process (participation in grant programs).</em></p></div><p><em>And advertising on social networks is also a kind of native content, which does not appear in special blocks, and is organically inscribed on one page or another and unobtrusively offers, just remembering the product as if «to the word». Popsters service functionality, which evaluates an account (or linked accounts of one person) for 35 parameters, but the main three areas: reach or influence, or how many users evaluate, comment on the recording; true reach – the number of people affected; network score – an assessment of the audience’s response to the impact, or how far the network information diverges (how many share information on this page).</em></p><p><strong><em>Key words:</em></strong><em> nativeness, native advertising, branded content, special project, communication strategy.</em></p>



2020 ◽  
Vol 2020 (10) ◽  
pp. 19-33
Author(s):  
Nadiia NOVYTSKA ◽  
◽  
Inna KHLIEBNIKOVA ◽  

The market of tobacco products in Ukraine is one of the most dynamic and competitive. It develops under the influence of certain factors that cause structural changes, therefore, the aim of the article is to conduct a comprehensive analysis of transformation processes in the market of tobacco and their alternatives in Ukraine and identify the factors that cause them. The high level of tax burden and the proliferation of alternative products with a potentially lower risk to human health, including heating tobacco products and e-cigarettes, are key factors in the market’s transformation process. Their presence leads to an increase in illicit turnover of tobacco products, which accounts for 6.37% of the market, and the gradual replacement of cigarettes with alternative products, which account for 12.95%. The presence on the market of products that are not taxed or taxed at lower rates is one of the reasons for the reduction of excise duty revenues. According to the results of 2019, the planned indicators of revenues were not met by 23.5%. Other reasons for non-fulfillment of excise duty revenues include: declining dynamics of the tobacco products market; reduction in the number of smokers; reorientation of «cheap whites» cigarette flows from Ukraine to neighboring countries; tax avoidance. Prospects for further research are identified, namely the need to develop measures for state regulation and optimization of excise duty taxation of tobacco products and their alternatives, taking into account the risks to public health and increasing demand of illegal products.



2020 ◽  
Vol 38 (3) ◽  
Author(s):  
Shoaib Ali ◽  
Imran Yousaf ◽  
Muhammad Naveed

This paper aims to examine the impact of external credit ratings on the financial decisions of the firms in Pakistan.  This study uses the annual data of 70 non-financial firms for the period 2012-2018. It uses ordinary least square (OLS) to estimate the impact of credit rating on capital structure. The results show that rated firm has a high level of leverage. Moreover, Profitability and tanagability are also found to be a significantly negative determinant of the capital structure, whereas, size of the firm has a significant positive relationship with the capital structure of the firm.  Besides, there exists a non-linear relationship between the credit rating and the capital structure. The rated firms have higher leverage as compared to the non-rated firms. The high and low rated firms have a low level of leverage, while mid rated firms have a higher leverage ratio. The finding of the study have practical implications for the manager; they can have easier access to the financial market by just having a credit rating no matter high or low. Policymakers must stress upon the rating agencies to keep improving themselves as their rating severs as the measure to judge the creditworthiness of the firm by both the investors and management as well.



2019 ◽  
Vol 72 (5) ◽  
pp. 779-783
Author(s):  
Victor A. Ognev ◽  
Anna A. Podpriadova ◽  
Anna V. Lisova

Introduction:The high level of morbidity and mortality from cardiovascular disease is largely due toinsufficient influence on the main risk factors that contribute to the development of myocardial infarction.Therefore, a detailed study and assessment of risk factors is among the most important problems of medical and social importance. The aim: To study and evaluate the impact of biological, social and hygienic, social and economic, psychological, natural and climatic risk factors on the development of myocardial infarction. Materials and methods: A sociological survey was conducted in 500 people aged 34 to 85. They were divided into two groups. The main group consisted of 310 patients with myocardial infarction. The control group consisted of 190 practically healthy people, identical by age, gender and other parameters, without diseases of the cardiovascular system. Results: It was defined that 30 factors have a significant impact on the development of myocardial infarction.Data analysis revealed that the leading risk factors for myocardial infarction were biological and socio-hygienic. The main biological factors were: hypertension and hypercholesterolemia. The man socio-hygienic factor was smoking. Conclusions: Identification of risk factors provides new opportunities for the development of more effective approaches for the prevention and treatment of myocardial infarction.



2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.



2021 ◽  
Vol 22 (9) ◽  
pp. 4961
Author(s):  
Maria Kovalska ◽  
Eva Baranovicova ◽  
Dagmar Kalenska ◽  
Anna Tomascova ◽  
Marian Adamkov ◽  
...  

L-methionine, an essential amino acid, plays a critical role in cell physiology. High intake and/or dysregulation in methionine (Met) metabolism results in accumulation of its intermediate(s) or breakdown products in plasma, including homocysteine (Hcy). High level of Hcy in plasma, hyperhomocysteinemia (hHcy), is considered to be an independent risk factor for cerebrovascular diseases, stroke and dementias. To evoke a mild hHcy in adult male Wistar rats we used an enriched Met diet at a dose of 2 g/kg of animal weight/day in duration of 4 weeks. The study contributes to the exploration of the impact of Met enriched diet inducing mild hHcy on nervous tissue by detecting the histo-morphological, metabolomic and behavioural alterations. We found an altered plasma metabolomic profile, modified spatial and learning memory acquisition as well as remarkable histo-morphological changes such as a decrease in neurons’ vitality, alterations in the morphology of neurons in the selective vulnerable hippocampal CA 1 area of animals treated with Met enriched diet. Results of these approaches suggest that the mild hHcy alters plasma metabolome and behavioural and histo-morphological patterns in rats, likely due to the potential Met induced changes in “methylation index” of hippocampal brain area, which eventually aggravates the noxious effect of high methionine intake.



Sign in / Sign up

Export Citation Format

Share Document