Open-Source’s Inspirations for Computational Social Science: Lessons from a Failed Analysis

The questions we can ask currently, building on decades of research, call for advanced methods and understanding. We now have large, complex data sets that require more than complex statistical analysis to yield human answers. Yet as some researchers have pointed out, we also have challenges, especially in computational social science. In a recent project I faced several such challenges and eventually realized that the relevant issues were familiar to users of free and open-source software. I needed a team with diverse skills and knowledge to tackle methods, theories, and topics. We needed to iterate over the entire project: from the initial theories to the data to the methods to the results. We had to understand how to work when some data was freely available but other data that might benefit the research was not. More broadly, computational social scientists may need creative solutions to slippery problems, such as restrictions imposed by terms of service for sites from which we wish to gather data. Are these terms legal, are they enforced, or do our institutional review boards care? Lastly—perhaps most importantly and dauntingly—we may need to challenge laws relating to digital data and access, although so far this conflict has been rare. Can we succeed as open-source advocates have?

Download Full-text

Facies classification using machine learning

The Leading Edge ◽

10.1190/tle35100906.1 ◽

2016 ◽

Vol 35 (10) ◽

pp. 906-909 ◽

Cited By ~ 47

Author(s):

Brendon Hall

Keyword(s):

Machine Learning ◽

Big Data ◽

Open Source ◽

Data Sets ◽

Complex Data ◽

Facies Classification ◽

Large Complex ◽

Complex Data Sets ◽

Software Platforms ◽

Tools And Techniques

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

Download Full-text

Interactive web-based visualization of phylogenetic trees using Phylogeny.IO

10.7287/peerj.preprints.2579 ◽

2016 ◽

Author(s):

Nikola Jovanovic ◽

Alexander S Mikheyev

Keyword(s):

Open Source ◽

Phylogenetic Trees ◽

Data Driven ◽

Data Sets ◽

Complex Data ◽

Web Page ◽

Web Based ◽

Interactive Display ◽

Complex Data Sets

Traditional static publication formats make visualization, exploration and sharing of massive phylogenetic trees difficult. Web-based technologies, such as the Data Driven Document (D3) JavaScript library, exist to overcome such challenges by allowing interactive display of complex data sets. We here we an open-source web-based application that applies the power of D3 to the visualization of phylogenetic trees. Phylogeny.IO (http://phyloeny.io) displays trees together with a range of static (e.g., such as shapes and colors) and dynamic (e.g., pop-up text and images) annotations. Annotated trees can be shared as IFrame HTML objects easily embeddable in any web page.

Download Full-text

Interactive web-based visualization of phylogenetic trees using Phylogeny.IO

10.7287/peerj.preprints.2579v1 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nikola Jovanovic ◽

Alexander S Mikheyev

Keyword(s):

Open Source ◽

Phylogenetic Trees ◽

Data Driven ◽

Data Sets ◽

Complex Data ◽

Web Page ◽

Web Based ◽

Interactive Display ◽

Complex Data Sets

Download Full-text

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Hadoop System

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Download Full-text

Alternative Clustering

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch001 ◽

2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Avinash Navlani ◽

V. B. Gupta

Keyword(s):

Research Problem ◽

Research Community ◽

Data Sets ◽

Complex Data ◽

High Quality ◽

Data Set ◽

Alternative Clustering ◽

Complex Data Sets ◽

Data Objects ◽

Community Clustering

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.

Download Full-text

Science Communication with Dinosaurs

Handbook of Research on Computational Science and Engineering - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-61350-116-0.ch024 ◽

2012 ◽

pp. 587-611

Author(s):

Phillip L. Manning ◽

Peter L. Falkingham

Keyword(s):

High Performance Computing ◽

Public Engagement ◽

Science Communication ◽

High Performance ◽

Computational Science ◽

Data Sets ◽

Complex Data ◽

Multidisciplinary Research ◽

Complex Data Sets ◽

Performance Computing

Dinosaurs successfully conjure images of lost worlds and forgotten lives. Our understanding of these iconic, extinct animals now comes from many disciplines, not just the science of palaeontology. In recent years palaeontology has benefited from the application of new and existing techniques from physics, biology, chemistry, engineering, but especially computational science. The application of computers in palaeontology is highlighted in this chapter as a key area of development in studying fossils. The advances in high performance computing (HPC) have greatly aided and abetted multiple disciplines and technologies that are now feeding paleontological research, especially when dealing with large and complex data sets. We also give examples of how such multidisciplinary research can be used to communicate not only specific discoveries in palaeontology, but also the methods and ideas, from interrelated disciplines to wider audiences. Dinosaurs represent a useful vehicle that can help enable wider public engagement, communicating complex science in digestible chunks.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Introduction to Big Data and Business Analytics

10.4018/978-1-6684-3662-2.ch004 ◽

2022 ◽

pp. 67-76

Author(s):

Dineshkumar Bhagwandas Vaghela

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Complex Data ◽

Business Analytics ◽

Database Applications ◽

Complex Data Sets ◽

History Of ◽

Rapid Generation

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.

Download Full-text

Federal Regulations, Institutional Review Boards and Qualitative Social Science Research

Federal Regulations ◽

10.4324/9780429051098-8 ◽

2019 ◽

pp. 103-118

Author(s):

Virginia Olesen

Keyword(s):

Social Science ◽

Social Science Research ◽

Science Research ◽

Institutional Review Boards ◽

Federal Regulations ◽

Institutional Review ◽

Review Boards

Download Full-text

Bayesian Modelling for Machine Learning

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch044 ◽

2005 ◽

pp. 236-242

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text