Integrating data science and R programming at an early stage

Author(s):  
Soma Datta ◽  
Veneela Nagabandi
Author(s):  
Sheik Abdullah A. ◽  
Selvakumar S. ◽  
Parkavi R. ◽  
Suganya R. ◽  
Abirami A. M.

The importance of big data over analytics made the process of solving various real-world problems simpler. The big data and data science tool box provided a realm of data preparation, data analysis, implementation process, and solutions. Data connections over any data source, data preparation for analysis has been made simple with the availability of tremendous tools in data analytics package. Some of the analytical tools include R programming, python programming, rapid analytics, and weka. The patterns and the granularity over the observed data can be fetched with the visualizations and data observations. This chapter provides an insight regarding the types of analytics in a big data perspective with the realm in applicability towards healthcare data. Also, the processing paradigms and techniques can be clearly observed through the chapter contents.


2020 ◽  
Vol 4 (s1) ◽  
pp. 118-119
Author(s):  
Luba Smolensky

OBJECTIVES/GOALS: This team science pilot program aims to elevate the quality of Parkinson’s disease modeling initiatives by strengthening connections between clinical researchers and computational teams. As many data science projects in Parkinson’s research would benefit from deeper clinical expertise, many clinical engagements would be improved by upfront integration of computational requirements. These team science programs, developed from design thinking methodologies, provide structured, sustainable, and scalable means for multi-disciplinary teams to come together and co-create translational science in PD. METHODS/STUDY POPULATION: Design Thinking (DT) could help yield an effective learning experience. DT is grounded in ethnographic research strategies and prototyping, relying heavily on grantee interviews and feedback. This approach is commonly used to navigate and design amidst complexity; its applications range from product to healthcare to instructional design. The following is an overview of the process as applied to this project: Discover: Once the core team (MJFF and project designers) has refined the key question they would like to answer, the team will begin gathering both primary and secondary data. This phase may include focus groups, one-on-one interviews, expert interviews, and immersive data-gathering. The purpose of this phase is to capture complexity and lay the groundwork to understand grantees’ perspectives and lexicon around their work. The deliverables of this phase are primarily unstructured research findings, such as transcribed interviews and secondary sources. Define: When sufficient data has been gathered, the core team will move into an initial round of synthesis and sense-making (making connections and assumptions to explain emerging themes in the data). This phase may include one to two in-person engagements with the core team. The purpose of this phase is to define the guiding principles for subsequent prototypes. It will also help reveal potential opportunity areas, both latent or apparent. The deliverables of this phase are agreed upon key themes, insights, and an informed “How Might We” question that will anchor the ideation process. Develop: Armed with informed themes, the core team will begin to brainstorm potential solutions. Following a set of brainstorming techniques, they will initially aim for quantity versus quality in order to allow potentially innovative and/or risky solutions to surface. Eventually, these ideas will be distilled into three robust and unique prototypes. Like the prior phase, ideation may also require one to two in-person engagements. The deliverables here are three unique prototypes; the reason for three is the ensure that the team does not anchor themselves in just one solution, but rather remains in an exploratory mindset as they solicit feedback on these prototypes from the grantees. Deliver: In this final phase, the core team revisits the grantees and presents the three prototypes. This phase may include conducting three small-scale pilots or simply just explaining the prototypes. Either way, it is important to solicit another round of feedback to ensure the solutions are indeed addressing the needs and context of grantees. Once completed, the core team will iterate a final pilot design and identify any remaining questions and assumptions they would like the pilot to inform. RESULTS/ANTICIPATED RESULTS: The team science pilot identifies five main opportunities to tighten collaboration, communication, and expectations across clinical and computational teams. Firstly, in-person events, held regularly in a central location, can act as an incubating space for these teams to partner, ideate, and pitch for grant funding. Secondly, co-developed guidelines for research questions would ensure consistent availability of clinically-relevant, computationally-feasible research topics. Thirdly, increasing the presence of Parkinson’s cohort data resources at computational conferences could introduce more diverse data and genetics interest in Parkinson’s research. Fourthly, a standard suite of research-facing, educational content (focused on both disease background and data basics) would ensure a strong baseline and launch-pad for PD modeling projects. Lastly, a fellowship program focused on early-stage researchers could establish a unique foundation to ground both clinical and computations fellows to collaboratively work on PD research as well as iterate on the aforementioned solutions. DISCUSSION/SIGNIFICANCE OF IMPACT: This team science program has the potential to upend collaborative silos in Parkinson’s research, accelerating disease modeling projects which otherwise stagnate or over-emphasize clinical v. computational aspects. By more effectively connecting team members with diverse backgrounds across clinical and computational roles, PD disease patterns can be discovered and validated ultimately resulting in improved patient care and therapeutic development. CONFLICT OF INTEREST DESCRIPTION: Several authors are staff members at The Michael J. Fox Foundation for Parkinson’s Research, the sponsor of this Team Science grant. All author and non-author contributors are grant recipients from The Michael J. Fox Foundation.


Author(s):  
Monish N

In recent years law enforcement have improved by taking better strategies, computer aided technology, efficient use of resource, etc. As a result of these over the couple of years there has been a steep decline in crime rate in the US (United States). Law enforcement have turned to data science for insights (ranging from reports, corrective analysis and behavior modelling). There has been an overall drop in crime rates in Chicago in recent years. In fact, these rates are at the lowest when compared to the previous decades. This paper uses the criminal dataset found at “data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2” to describe historical trends, insights, etc. in Chicago from 1965 to 2018 and not to assign any casual interpretation of the vanguards of crime rates during this period. Here K-Nearest Neighbor (KNN) classification is used for training and crime predication. Discussions on future investigation can also be found. The proposed model has an accuracy of 83.2%.


2019 ◽  
Vol 3 (4) ◽  
pp. 13-24 ◽  
Author(s):  
Belen Suarez Lopez ◽  
David Issó García ◽  
Antonio Vargas Alcaide

This paper has the main purpose to make a critical and balanced analysis about the potential of blockchain technology to face some of the great current socioeconomic challenges, being focus on impact assessment point of view, analyzing the disruptive potential of blockchain to provide solutions at level of different challenges as example, climate change, migrant movements, gender equality, financial inclusion or the cost opportunity of the management of data science. The term blockchain summary a numerous different type of system of distributed ledger, essentially, it is just a record distributed, a ledger of digital events that is distributed or shared among many different parts within an ecosystem (nodes), and chronological in a network. The technology is at an early stage and can be implemented in many ways depending on the objective. The methodological tool for the research is strategic and qualitative SWOT analysis identifying the critical success factors such internal factors (Strengthens and Weakness), and external factors (Opportunities and Threats), summarizes the arguments and counterarguments within the scientific discussion. From the bibliographic review carried out on the finding and disclosure provided by empirical research about business case studies, the research results summarized in the paper confirm that although looks difficulty of give a closed definition to variety of system under the umbrella of blockchain, among the main strengths of technology are its intrinsic characteristics, such as, its ability to store data immutably without relying on a central authority. As weakness, highlight the fact of the need of solve some non-minor inefficiencies as energy consumption and, as result, the difficulty to be scaled. It has the potential to replace the intermediary and central entities or change the way they works, allowing disintermediation and potentially empower people in trade, democratic participation, social interaction and financial inclusion, which represent great opportunities. Although, on the side of threats there is lack of knowledge about the technology, which generates resistance from regulators who are beginning to assess risks and are concerned about how new participants could cannibalize their income models. In addition, it seems clear the importance of assume the fact that the technological changes take time to develop and often require the adaptation of entire ecosystems. Keywords: blockchain, decentralization, democratization, financial inclusion, socioeconomic challenges, tokem traceability, transparency, trust.


The terms machine learning, deep learning and data science are buzz words now a days. The usage of these techniques with some technologies like R and Python is most common in the industry and academics. The current work is dealing with the inherent logics existing in the algorithms like Classification, Dimensionality reduction and Recommender systems along with the suitable examples. Some of the applications mentioned here like Facebook, Twitter and LinkedIn to exploit the usage of these algorithms in their daily usage. The discussion about online platforms like Amazon, Flipkart are other areas where the recommender systems were most commonly used algorithms. The outcome of the work is the logical things hidden in the usage of the algorithms and the implementation wise which are packages and functions helpful for the implementation of the algorithms. The belief is the work will be helpful for the researchers and academicians in the context of algorithmic perspective and they can extend the work by contributing their thoughts and views on the same work. Unlike in the normal programming, R/Python simplifies the logic of algorithms so that the lines of code and understanding of the problem is bit simple when compared with general programming languages. The work explains the mail respondents related to the allocation of the house by the company as a response to their mail by considering Urban, semi-urban and rural areas of the customers, the income range of the customers also observed in the allocation of the house. The implementations are with R by using classification and the corresponding results were published with the explanation of the values found in the implementation.


2021 ◽  
Vol 2091 (1) ◽  
pp. 012064
Author(s):  
A P Khlebtsov ◽  
A N Shilin ◽  
A V Rybakov ◽  
A Yu Klyucharev

Abstract In this paper, an expert information system for assessing the technical condition of a power transformer is developed. The system will work on the basis of the fuzzy logic device, and provide operational information about the state of the power transformer. The paper uses fuzzy inference algorithms. The R programming language is used to write a program that uses fuzzy logic. We analyzed the data of chromatographic analysis of gases dissolved in oil, as well as the data of thermal imaging images, identifying the most heated points in power transformers. A database of fuzzy logic rules has been formed. Several examples of defuzzification of the results obtained by the center of gravity method are given. As a result of the program, a three-dimensional graph was obtained that characterizes the surface of the fuzzy output. The developed software package allows you to detect defects in working electrical equipment at an early stage of their development, which not only prevents a sudden shutdown of production as a result of an accident, but also significantly reduces the cost of repairing equipment and increases its service life


2021 ◽  
Author(s):  
Samantha P. Kelly ◽  
Vikram V. Shende ◽  
Autumn R. Flynn ◽  
Qingyun Dan ◽  
Ying Ye ◽  
...  

Prenyltransfer is an early-stage carbon–hydrogen bond (C–H) functionalization prevalent in the biosynthesis of a diverse array of biologically active bacterial, fungal, plant, and metazoan diketopiperazine (DKP) alkaloids. Towards the development of a unified strategy for biocatalytic construction of prenylated DKP indole alkaloids, we sought to identify and characterize a substrate-permissive C2 reverse prenyltransferase (PT). In the biosynthesis of cytotoxic notoamide metabolites, PT NotF is responsible for catalyzing the first tailoring event of C2 reverse prenyltransfer of brevianamide F (cyclo(L-Trp-L-Pro)). Obtaining a high-resolution crystal structure of NotF (in complex with native substrate and prenyl donor mimic dimethylallyl S-thiolodiphosphate (DMSPP)) revealed a large, solvent exposed substrate binding site, intimating NotF may possess significant substrate promiscuity. To assess the full potential of NotF’s broad substrate selectivity, we synthesized a panel of 30 tryptophanyl DKPs with a suite of sterically and electronically differentiated amino acids, which were selectively prenylated by NotF in often synthetically useful conversions (2 to >99%). Quantitative representation of this substrate library enabled the development of a descriptive statistical model that provided insight into the origins of NotF’s substrate promiscuity. Through this unique approach for understanding enzyme scope, we identified key substrate descriptors such as electrophilicity, size, and flexibility, that govern enzymatic turnover by NotF. Additionally, we demonstrated the ability to couple NotF-catalyzed prenyltransfer with oxidative cyclization using recently characterized flavin monooxygenase, BvnB, from the brevianamide biosynthetic pathway. This one-pot, in vitro biocatalytic cascade proceeds with exceptional substrate recognition, and enabled the first chemoenzymatic synthesis of the marine fungal natural product, (–)-eurotiumin A, in three steps and 60% overall yield.


Author(s):  
Somula Ramasubbareddy ◽  
Govinda K ◽  
Ashish Kr. Luhach ◽  
Swetha E

Background: Now-a-days, the demand for data science related job positions have seen huge due to the recent data explosion incurred by the industries and organizations globally. The necessity to harness and utilize the amount of information hidden inside these huge datasets for effective decision-making has become the need of the hour. However, this scenario is where a data analyst or a data scientist comes into play. They are domain experts who have the skillset and expertise to extract hidden meaning from data and convert them into useful insights. This work illustrates the use of data mining and advanced data analysis techniques such as data aggregation, summarization along with data visualization using R tool to understand and analyse the job trends in United states of America(USA) and then drill down to analyse job trends for data science related job positions from year 2011 to 2016. Objective: This paper discusses the general job trends in US and how the job seekers are migrating from one place to another place using Visa for different titles, majorly for business analytics. Methods: Analytics is done using R programming, different functions of the programming on various parameters and inference is drawn on the result. Results: The aim of this analysis is to predict the job trends in line with demand, region, employers, wages in USD.


Author(s):  
Madalina Viorica Manu ◽  
Ilie Vasile

In this paper, we compare some of the essential traits of the software preferred by researchers, students, and professors, such as R or RStudio, or Matlab. In order to fill the gap in the Romanian literature and help finance students in choosing proper tools according to the research purpose, this comparative study aims at bringing a fresh, useful perspective in the relevant literature. In Romania, the use of R was the focus of several international conferences on official statistics held in Bucharest, and others having business excellence, innovation, and sustainability as purpose, while Eviews is recommended and taught by the Romanian professors. At this time, at a global scale, R programming language is considered the lingua franca of data science, as common statistical software used both in corporations and academia. In this paper, I analyze the basic features of such software, with the purpose of application in finance.


2021 ◽  
Author(s):  
Dele Fei ◽  
Yu Sun

This is a data science project for a manufacturing company in China [1]. The task was to forecast the likelihood that each product would need repair or service by a technician in order to forecast how often the products would need to be serviced after they were installed. That forecast could then be used to estimate the correct price for selling a product warranty [2]. The underlying forecast model in the R Programming language for all of the companies products is established. In addition, an interactive web app using R Shiny is developed so the business could see the forecast and recommended warranty price for each of their products and customer types [3]. The user can select a product and customer type and input the number of products and the web app displays charts and tables that show the probability of the product needing service over time, the forecasted costs of service, along with potential income and the recommended warranty price.


Sign in / Sign up

Export Citation Format

Share Document