scholarly journals Ten important roles for academic leaders in data science

2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jason H. Moore

Abstract Data science has emerged as an important discipline in the era of big data and biological and biomedical data mining. As such, we have seen a rapid increase in the number of data science departments, research centers, and schools. We review here ten important leadership roles for a successful academic data science chair, director, or dean. These roles include the visionary, executive, cheerleader, manager, enforcer, subordinate, educator, entrepreneur, mentor, and communicator. Examples specific to leadership in data science are given for each role.

Author(s):  
Gurdeep S Hura

This chapter presents this new emerging technology of social media and networking with a detailed discussion on: basic definitions and applications, how this technology evolved in the last few years, the need for dynamicity under data mining environment. It also provides a comprehensive design and analysis of popular social networking media and sites available for the users. A brief discussion on the data mining methodologies for implementing the variety of new applications dealing with huge/big data in data science is presented. Further, an attempt is being made in this chapter to present a new emerging perspective of data mining methodologies with its dynamicity for social networking media and sites as a new trend and needed framework for dealing with huge amount of data for its collection, analysis and interpretation for a number of real world applications. A discussion will also be provided for the current and future status of data mining of social media and networking applications.


Author(s):  
José Luis Ambite ◽  
Jonathan Gordon ◽  
Lily Fierro ◽  
Gully Burns ◽  
Joel Mathew

The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.


2016 ◽  
Vol 21 (3) ◽  
pp. 525-547 ◽  
Author(s):  
Scott Tonidandel ◽  
Eden B. King ◽  
Jose M. Cortina

Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.


Author(s):  
Gurdeep S Hura

This chapter presents this new emerging technology of social media and networking with a detailed discussion on: basic definitions and applications, how this technology evolved in the last few years, the need for dynamicity under data mining environment. It also provides a comprehensive design and analysis of popular social networking media and sites available for the users. A brief discussion on the data mining methodologies for implementing the variety of new applications dealing with huge/big data in data science is presented. Further, an attempt is being made in this chapter to present a new emerging perspective of data mining methodologies with its dynamicity for social networking media and sites as a new trend and needed framework for dealing with huge amount of data for its collection, analysis and interpretation for a number of real world applications. A discussion will also be provided for the current and future status of data mining of social media and networking applications.


With the tremendous growth in the areas of computing, statistics, and mathematics has led to the rise of the emerging field of expertise, named ‘Data Science’. This paper focuses on the comparative study and evaluation of the data science libraries used in Python Programming Languages, named ‘Matplotlib’ and ‘Seaborn’. The sole purpose of this paper is to identify areas and evaluate the strengths and weaknesses of these libraries with the implementation of code and identify the classification of the univariate and multivariate plotting of data concerned with patterns of data visualization and computational modelling of data in the form of processed information using techniques of big data and data mining


2014 ◽  
Vol 16 (48) ◽  
pp. 26684-26690 ◽  
Author(s):  
Jacqueline M. Cole ◽  
Kian Sing Low ◽  
Hiroaki Ozoe ◽  
Panagiota Stathi ◽  
Chitoshi Kitamura ◽  
...  

Big data science informs energy research: large-scale screening of crystal structures identifies unforeseen class of dyes for dye-sensitised solar cells.


2019 ◽  
Vol 70 (2-3) ◽  
pp. 127-133
Author(s):  
Hidir Aras

Zusammenfassung In diesem Beitrag geht es um das interdisziplinäre Erlernen von Data Science u. a. im Rahmen von Aus- und Weiterbildungsmaßnahmen mittels interaktiver Lernumgebungen am Beispiel der Analyse großer Datenmengen mit Patentinformationen für neue Nutzergruppen wie z. B. den Informationsspezialisten, welche in der Regel über wenig bis keine Kenntnisse z. B. über Verfahren des maschinellen Lernens verfügen. Mittels einer interaktiven Lernumgebung auf Grundlage von Scientific Workflows und Big-Data-Technologien können dabei neue Methoden des Text und Data Mining (TDM) effizient erlernt und im Rahmen praktischer Anwendungsfälle erprobt werden.


2021 ◽  
Vol 11 (2) ◽  
pp. 478-486
Author(s):  
Jing Zheng ◽  
Zhongjun Gao ◽  
Lixin Pu ◽  
Mingjie He ◽  
Jipeng Fan ◽  
...  

Using the medical big data mining related technology, the model of tumor disease was analyzed and studied. Using data science methods as a guiding method and idea, analyzing and constructing a medical service model based on big data for oncology diseases, exploring its development strategy; using business process analysis method to analyze the business process and mapping of cancer disease medical services; using serviceoriented architecture analysis and Design methodology to build a highly flexible, configurable, and easily scalable precision medical big data platform. By analyzing the characteristics of medical big data and the shortcomings of the traditional Apriori algorithm, the Hadoop platform is used to improve and optimize the Apriori algorithm. The results show that the improved Apriori algorithm has great improvement in efficiency and performance, and can be adapted to mining medical big data. Through data mining experiments, it is concluded that there is a correlation between tumors and smoking, chronic infection, occupational pathogenic factors, etc. It has certain guiding significance for the prevention and treatment of tumors, thus also demonstrating the improved Apriori algorithm for lung tumors. Clinical research has practical significance.


Author(s):  
Sri Venkat Gunturi Subrahmanya ◽  
Dasharathraj K. Shetty ◽  
Vathsala Patil ◽  
B. M. Zeeshan Hameed ◽  
Rahul Paul ◽  
...  

AbstractData science is an interdisciplinary field that extracts knowledge and insights from many structural and unstructured data, using scientific methods, data mining techniques, machine-learning algorithms, and big data. The healthcare industry generates large datasets of useful information on patient demography, treatment plans, results of medical examinations, insurance, etc. The data collected from the Internet of Things (IoT) devices attract the attention of data scientists. Data science provides aid to process, manage, analyze, and assimilate the large quantities of fragmented, structured, and unstructured data created by healthcare systems. This data requires effective management and analysis to acquire factual results. The process of data cleansing, data mining, data preparation, and data analysis used in healthcare applications is reviewed and discussed in the article. The article provides an insight into the status and prospects of big data analytics in healthcare, highlights the advantages, describes the frameworks and techniques used, briefs about the challenges faced currently, and discusses viable solutions. Data science and big data analytics can provide practical insights and aid in the decision-making of strategic decisions concerning the health system. It helps build a comprehensive view of patients, consumers, and clinicians. Data-driven decision-making opens up new possibilities to boost healthcare quality.


Author(s):  
Trudie Steyn ◽  
Nico Martins

Most literature assumptions have been drawn from public databases e.g. NHANES (National Health and Nutrition Examination Survey). Nonetheless, the sets of data are typically featured by high-dimensional timeliness, heterogeneity, characteristics and irregularity, hence amounting to valuation of these databases not being applied completely. Data Mining (DM) technologies have been the frontiers domains in biomedical studies, as it shows smart routine in assessing patients’ risks and aiding in the process of biomedical research and decision-making in developing disease-forecasting frameworks. In that case, DM has novel merits in biomedical Big Data (BD) studies, mostly in large-scale biomedical datasets. In this paper, a description of DM techniques alongside their fundamental practical applications will be provided. The objectives of this study are to help biomedical researchers to attain intuitive and clear appreciative of the applications of data-mining technologies on biomedical BD to enhance to creation of biomedical results, which are relevant in a biomedical setting.


Sign in / Sign up

Export Citation Format

Share Document