Exploring pair programming beyond computer science: a case study in its use in data science/data engineering

Author(s):  
Jeffrey S. Saltz ◽  
Ivan Shamshurin
CITAS ◽  
2016 ◽  
Vol 2 (1) ◽  
pp. 39-46
Author(s):  
Ixent Galpin

The role of data scientist has been described as the “sexiest job of the 21st Century”. While possibly there is a degree of hype associated with such a claim, there are factors at play such as the unprecedented growth in the amount of data being generated. This paper characterises the already established disciplines which underpin data science, viz., data engineering, statistics, and data mining. Following a characterisation of the previous fields, data science is found to be most closely related to data mining. However, in contrast to data mining, data science promises to operate over datasets that exhibit significant challenges in terms of the four Vs: Volume, Variety, Velocity and Veracity. This paper notes that the current emphasis, both in industry and academia, is on the first three Vs, which pose mainly scientific or technological challenges, rather than Veracity, which is a truly scientific (and arguably a more complex) challenge. Data Science can be seen to have a more ambitious objective than what traditionally data mining has: as a science, data science aims to lead to the creation of new theories and knowledge. This paper notes that, ironically, the veracity dimension, which is arguably the closest one relating to this objective, is being neglected. Despite the current media frenzy about data science, the paper concludes that more time is needed to see whether it will emerge as discipline in its own right.


Author(s):  
Abeer A. Amer ◽  
Soha M. Ismail

The following article has been withdrawn on the request of the author of the journal Recent Advances in Computer Science and Communications (Recent Patents on Computer Science): Title: Diabetes Mellitus Prognosis Using Fuzzy Logic and Neural Networks Case Study: Alexandria Vascular Center (AVC) Authors: Abeer A. Amer and Soha M. Ismail* Bentham Science apologizes to the readers of the journal for any inconvenience this may cause BENTHAM SCIENCE DISCLAIMER: It is a condition of publication that manuscripts submitted to this journal have not been published and will not be simultaneously submitted or published elsewhere. Furthermore, any data, illustration, structure or table that has been published elsewhere must be reported, and copyright permission for reproduction must be obtained. Plagiarism is strictly forbidden, and by submitting the article for publication the authors agree that the publishers have the legal right to take appropriate action against the authors, if plagiarism or fabricated information is discovered. By submitting a manuscript, the authors agree that the copyright of their article is transferred to the publishers if and when the article is accepted for publication.


Author(s):  
Laura Ballerini ◽  
Sylvia I. Bergh

AbstractOfficial data are not sufficient for monitoring the United Nations Sustainable Development Goals (SDGs): they do not reach remote locations or marginalized populations and can be manipulated by governments. Citizen science data (CSD), defined as data that citizens voluntarily gather by employing a wide range of technologies and methodologies, could help to tackle these problems and ultimately improve SDG monitoring. However, the link between CSD and the SDGs is still understudied. This article aims to develop an empirical understanding of the CSD-SDG link by focusing on the perspective of projects which employ CSD. Specifically, the article presents primary and secondary qualitative data collected on 30 of these projects and an explorative comparative case study analysis. It finds that projects which use CSD recognize that the SDGs can provide a valuable framework and legitimacy, as well as attract funding, visibility, and partnerships. But, at the same time, the article reveals that these projects also encounter several barriers with respect to the SDGs: a widespread lack of knowledge of the goals, combined with frustration and political resistance towards the UN, may deter these projects from contributing their data to the SDG monitoring apparatus.


2014 ◽  
Vol 7 (3) ◽  
pp. 291-301 ◽  
Author(s):  
Maria-Blanca Ibanez ◽  
Angela Di-Serio ◽  
Carlos Delgado-Kloos

2021 ◽  
Vol 20 (01) ◽  
pp. 2150011
Author(s):  
Worapan Kusakunniran ◽  
Thearith Ponn ◽  
Nuttapol Boonsom ◽  
Suwimol Wahakit ◽  
Kittikhun Thongkanchorn

This paper develops the Scopus H5-Index rankings, using the field of computer science as a case study. The challenge begins with the inconsistency of conference names. The rule-based approach is invented to automatically clean up duplicate conferences and assign unique pseudo ID for each conference. This data cleansing process is applied on conference names retrieved from both Scopus and ERA/CORE, in order to share common pseudo IDs for the sake of correlation analysis. The proposed data cleansing process is validated using ERA 2010 and CORE 2018 as references and reports the very small errors of 0.6% and 0.4%, respectively. Then, the Scopus H5-Index 2006–2010 and Scopus H5-Index 2014–2018 rankings are constructed and compared with the existing ERA 2010 and CORE 2018 rankings, respectively. The results show that the correlation within the Scopus H5-Index rankings (i.e. Scopus H5-Index 2006–2010 and Scopus H5-Index 2014–2018) is at the top of the moderate correlation band, where the correlation within the ERA/CORE rankings (ERA 2010 and CORE 2018) is at the top of the strong correlation band. While the correlations across ranking systems (i.e. Scopus H5-Index 2006–2010 vs. ERA 2010, and Scopus H5-Index 2014–2018 vs. CORE 2018) are at the bottom and middle of the moderate correlation band. It can be said that the quality assessment using the Scopus H5-Index ranking is more dynamic and quickly up-to-date when compared with the ERA/CORE ranking. Also, these two ranking systems are moderately correlated with each other for both periods of 2010 and 2018.


Sign in / Sign up

Export Citation Format

Share Document