Using Genetic Programming for Data Science: Lessons Learned

Author(s):  
Steven Gustafson ◽  
Ram Narasimhan ◽  
Ravi Palla ◽  
Aisha Yousuf
2019 ◽  
pp. 447-465
Author(s):  
Kurt Stockinger ◽  
Martin Braschler ◽  
Thilo Stadelmann

2020 ◽  
Vol 3 (1) ◽  
pp. 289-314 ◽  
Author(s):  
James M. Hoffman ◽  
Allen J. Flynn ◽  
Justin E. Juskewitch ◽  
Robert R. Freimuth

Pharmacogenomic information must be incorporated into electronic health records (EHRs) with clinical decision support in order to fully realize its potential to improve drug therapy. Supported by various clinical knowledge resources, pharmacogenomic workflows have been implemented in several healthcare systems. Little standardization exists across these efforts, however, which limits scalability both within and across clinical sites. Limitations in information standards, knowledge management, and the capabilities of modern EHRs remain challenges for the widespread use of pharmacogenomics in the clinic, but ongoing efforts are addressing these challenges. Although much work remains to use pharmacogenomic information more effectively within clinical systems, the experiences of pioneering sites and lessons learned from those programs may be instructive for other clinical areas beyond genomics. We present a vision of what can be achieved as informatics and data science converge to enable further adoption of pharmacogenomics in the clinic.


Author(s):  
Akeem Pedro ◽  
Anh-Tuan Pham-Hang ◽  
Phong Thanh Nguyen ◽  
Hai Chien Pham

Accident, injury, and fatality rates remain disproportionately high in the construction industry. Information from past mishaps provides an opportunity to acquire insights, gather lessons learned, and systematically improve safety outcomes. Advances in data science and industry 4.0 present new unprecedented opportunities for the industry to leverage, share, and reuse safety information more efficiently. However, potential benefits of information sharing are missed due to accident data being inconsistently formatted, non-machine-readable, and inaccessible. Hence, learning opportunities and insights cannot be captured and disseminated to proactively prevent accidents. To address these issues, a novel information sharing system is proposed utilizing linked data, ontologies, and knowledge graph technologies. An ontological approach is developed to semantically model safety information and formalize knowledge pertaining to accident cases. A multi-algorithmic approach is developed for automatically processing and converting accident case data to a resource description framework (RDF), and the SPARQL protocol is deployed to enable query functionalities. Trials and test scenarios utilizing a dataset of 200 real accident cases confirm the effectiveness and efficiency of the system in improving information access, retrieval, and reusability. The proposed development facilitates a new “open” information sharing paradigm with major implications for industry 4.0 and data-driven applications in construction safety management.


2017 ◽  
Vol 2 (3) ◽  
pp. 40-49 ◽  
Author(s):  
Mohammed Zuhair Al-Taie ◽  
Naomie Salim ◽  
Adekunle Isiaka Obasa

The workflow from data understanding to deployment of an analytical model of a data science project begins at framing the problem at hand, a task that is typically business-oriented and requires human-to-human interaction. However, the next three steps: data understanding, feature extraction, and model building that come next in the pipeline are the key to successful data science projects. Failing to fully understand the requirements of each of these three steps can negatively affect the performance of the proposed system. Hence, the current study tries to answer the following question “What are the requirements of a successful data science project?” To answer this question, we will use the solution that we built to measure the relevance of local search results of small online e-businesses and submitted to Kaggle data science platform to shed light on why our solution did not achieve a top position among other competitors. Evaluation of the design that we submitted to the competition is going to be carried out in the spirit of the three winning submissions. Our results revealed that well-performed data preprocessing, well-defined features, and model ensembling are critical for building successful data science projects. Such a clarification provides insight into specific aspects of model design to help others including Kagglers avoid possible mistakes while approaching their data science projects.


2018 ◽  
Vol 12 (2) ◽  
pp. 266-273
Author(s):  
Jez Cope ◽  
James Baker

Much time and energy is now being devoted to developing the skills of researchers in the related areas of data analysis and data management. However, less attention is currently paid to developing the data skills of librarians themselves: these skills are often brought in by recruitment in niche areas rather than considered as a wider development need for the library workforce, and are not widely recognised as important to the professional career development of librarians. We believe that building computational and data science capacity within academic libraries will have direct benefits for both librarians and the users we serve. Library Carpentry is a global effort to provide training to librarians in technical areas that have traditionally been seen as the preserve of researchers, IT support and systems librarians. Established non-profit volunteer organisations, such as Software Carpentry and Data Carpentry, offer introductory research software skills training with a focus on the needs and requirements of research scientists. Library Carpentry is a comparable introductory software skills training programme with a focus on the needs and requirements of library and information professionals. This paper describes how the material was developed and delivered, and reports on challenges faced, lessons learned and future plans.


Sign in / Sign up

Export Citation Format

Share Document