Small Data, Big Justice: The Intersection of Data Science, Social Good, and Social Services

AbstractBackgroundCreating a computational infrastructure to analyze the wealth of information contained in data repositories that scales well is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared Data Science Infrastructures like Boa can be used to more efficiently process and parse data contained in large data repositories. The main features of Boa are inspired from existing languages for data intensive computing and can easily integrate data from biological data repositories.ResultsHere, we present an implementation of Boa for Genomic research (BoaG) on a relatively small data repository: RefSeq’s 97,716 annotation (GFF) and assembly (FASTA) files and metadata. We used BoaG to query the entire RefSeq dataset and gain insight into the RefSeq genome assemblies and gene model annotations and show that assembly quality using the same assembler varies depending on species.ConclusionsIn order to keep pace with our ability to produce biological data, innovative methods are required. The Shared Data Science Infrastructure, BoaG, can provide greater access to researchers to efficiently explore data in ways previously not possible for anyone but the most well funded research groups. We demonstrate the efficiency of BoaG to explore the RefSeq database of genome assemblies and annotations to identify interesting features of gene annotation as a proof of concept for much larger datasets.

Download Full-text

Data Science for Social Good

10.1007/978-3-030-78985-5 ◽

2021 ◽

Keyword(s):

Data Science ◽

Social Good

Download Full-text

Artificial Intelligence, Machine Learning and Data Science as Iterations of Business Automation for Small Businesses

Management of Data in AI Age ◽

10.46679/isbn978819484834904 ◽

2020 ◽

pp. 87-94

Author(s):

Pooja Sharma ◽

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Multinational Corporations ◽

Small Businesses ◽

Data Science ◽

Small Data ◽

Decision Systems ◽

Business Operations ◽

Business Automation ◽

Machine Learning Tool

Artificial intelligence and machine learning, the two iterations of automation are based on the data, small or large. The larger the data, the more effective an AI or machine learning tool will be. The opposite holds the opposite iteration. With a larger pool of data, the large businesses and multinational corporations have effectively been building, developing and adopting refined AI and machine learning based decision systems. The contention of this chapter is to explore if the small businesses with small data in hands are well-off to use and adopt AI and machine learning based tools for their day to day business operations.

Download Full-text

A Middle-School Camp Emphasizing Data Science and Computing for Social Good

Proceedings of the 50th ACM Technical Symposium on Computer Science Education - SIGCSE '19 ◽

10.1145/3287324.3287510 ◽

2019 ◽

Cited By ~ 2

Author(s):

Caelin Bryant ◽

Yesheng Chen ◽

Zhen Chen ◽

Jonathan Gilmour ◽

Shyamala Gumidyala ◽

...

Keyword(s):

Middle School ◽

Data Science ◽

Social Good

Download Full-text

Data science ethics in government

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2016.0119 ◽

2016 ◽

Vol 374 (2083) ◽

pp. 20160119 ◽

Cited By ~ 7

Author(s):

Cat Drew

Keyword(s):

Data Science ◽

New Technologies ◽

Unintended Consequences ◽

Ethical Framework ◽

Social Good ◽

Science Ethics ◽

Policy Goals ◽

Challenges And Opportunities ◽

Government Data ◽

The Government

Data science can offer huge opportunities for government. With the ability to process larger and more complex datasets than ever before, it can provide better insights for policymakers and make services more tailored and efficient. As with all new technologies, there is a risk that we do not take up its opportunities and miss out on its enormous potential. We want people to feel confident to innovate with data. So, over the past 18 months, the Government Data Science Partnership has taken an open, evidence-based and user-centred approach to creating an ethical framework. It is a practical document that brings all the legal guidance together in one place, and is written in the context of new data science capabilities. As part of its development, we ran a public dialogue on data science ethics, including deliberative workshops, an experimental conjoint survey and an online engagement tool. The research supported the principles set out in the framework as well as provided useful insight into how we need to communicate about data science. It found that people had a low awareness of the term ‘data science’, but that showing data science examples can increase broad support for government exploring innovative uses of data. But people's support is highly context driven. People consider acceptability on a case-by-case basis, first thinking about the overall policy goals and likely intended outcome, and then weighing up privacy and unintended consequences. The ethical framework is a crucial start, but it does not solve all the challenges it highlights, particularly as technology is creating new challenges and opportunities every day. Continued research is needed into data minimization and anonymization, robust data models, algorithmic accountability, and transparency and data security. It also has revealed the need to set out a renewed deal between the citizen and state on data, to maintain and solidify trust in how we use people's data for social good. This article is part of the themed issue ‘The ethical impact of data science’.

Download Full-text

A standards organization for Open and FAIR neuroscience: the International Neuroinformatics Coordinating Facility

10.31219/osf.io/3rt9b ◽

2019 ◽

Author(s):

Mathew Abrams ◽

Jan G. Bjaalie ◽

Samir Das ◽

Gary F. Egan ◽

Satrajit S Ghosh ◽

...

Keyword(s):

Best Practices ◽

Data Science ◽

Small Data ◽

Sustainable Infrastructure ◽

The World ◽

Formal Procedure ◽

Fair Principles ◽

Community Standards ◽

Diverse Data ◽

Independent Organization

There is great need for coordination around standards and best practices in neuroscience to support efforts to make neuroscience a data-centric discipline. Major brain initiatives launched around the world are poised to generate huge stores of neuroscience data. At the same time, neuroscience, like many domains in biomedicine, is confronting the issues of transparency, rigor, and reproducibility. Widely used, validated standards and best practices are key to addressing the challenges in both big and small data science, as they are essential for integrating diverse data and for developing a robust, effective and sustainable infrastructure to support open and reproducible neuroscience. However, developing community standards and gaining their adoption is difficult. The current landscape is characterized both by a lack of robust, validated standards and a plethora of overlapping, underdeveloped, untested and underutilized standards and best practices. The International Neuroinformatics Coordinating Facility (INCF), established in 2005, is an independent organization dedicated to promoting data sharing through the coordination of infrastructure and standards. INCF has recently implemented a formal procedure for evaluating and endorsing community standards and best practices in support of the FAIR principles. By formally serving as a standards organization dedicated to open and FAIR neuroscience, INCF helps evaluate, promulgate and coordinate standards and best practices across neuroscience. Here, we provide an overview of the process and discuss how neuroscience can benefit from having a dedicated standards body.

Download Full-text

DSAA 2018 Special Session: Data Science for Social Good

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) ◽

10.1109/dsaa.2018.00060 ◽

2018 ◽

Author(s):

Daniela Paolotti ◽

Michele Tizzoni

Keyword(s):

Data Science ◽

Special Session ◽

Social Good

Download Full-text

Data Science for Social Good Chairs' Welcome

Companion Proceedings of The 2019 World Wide Web Conference on - WWW '19 ◽

10.1145/3308560.3316484 ◽

2019 ◽

Keyword(s):

Data Science ◽

Social Good

Download Full-text

Translational Research for Occupational Therapy: Using SPRE in Hippotherapy for Children with Developmental Disabilities

Occupational Therapy International ◽

10.1155/2017/2305402 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10

Author(s):

Deborah Weissman-Miller ◽

Rosalie J. Miller ◽

Mary P. Shotwell

Keyword(s):

Developmental Disabilities ◽

Occupational Therapy ◽

Translational Research ◽

Change Point ◽

Data Science ◽

Assessment Tool ◽

Single Subject ◽

Small Data ◽

Ratio Estimator ◽

Children With Developmental Disabilities

Translational research is redefined in this paper using a combination of methods in statistics and data science to enhance the understanding of outcomes and practice in occupational therapy. These new methods are applied, using larger data and smaller single-subject data, to a study in hippotherapy for children with developmental disabilities (DD). The Centers for Disease Control and Prevention estimates DD affects nearly 10 million children, aged 2–19, where diagnoses may be comorbid. Hippotherapy is defined here as a treatment strategy in occupational therapy using equine movement to achieve functional outcomes. Semiparametric ratio estimator (SPRE), a single-subject statistical and small data science model, is used to derive a “change point” indicating where the participant adapts to treatment, from which predictions are made. Data analyzed here is from an institutional review board approved pilot study using the Hippotherapy Evaluation and Assessment Tool measure, where outcomes are given separately for each of four measured domains and the total scores of each participant. Analysis with SPRE, using statistical methods to predict a “change point” and data science graphical interpretations of data, shows the translational comparisons between results from larger mean values and the very different results from smaller values for each HEAT domain in terms of relationships and statistical probabilities.

Download Full-text