What Is Open Source Software (OSS) and What Is Big Data?

2022 ◽  
pp. 77-118
Author(s):  
Richard S. Segall

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.

Author(s):  
Richard S. Segall

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.


Author(s):  
Richard S. Segall

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.


Author(s):  
Subodh Kesharwani

Open-source software (OSS) is computer software with its source code made accessible with a license in which the copyright holder provides the rights to cram, revolutionize, and assign the software to everyone and for any insistence. Open-source software may be time-honored in a collaborative community form. The Group who had initiated opensource software is an obvious example of open collaboration. Open-source software at present is expansively used both as self-governing applications and as components in non-open-source applications. Many independent software vendors (ISVs), value-added resellers (VARs), and hardware vendors (OEMs or ODMs) make use of open-source frameworks, modules, and libraries within their brand-named, for-profit products and services.


2020 ◽  
pp. 341-377
Author(s):  
Richard S. Segall ◽  
Gao Niu

Big Data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. This article discusses what is Big Data, and its characteristics, and how this information revolution of Big Data is transforming our lives and the new technology and methodologies that have been developed to process data of these huge dimensionalities. Big Data can be discrete or a continuous stream of data, and can be accessed using many types and kinds of computing devices ranging from supercomputers, personal work stations, to mobile devices and tablets. Discussion is presented of how fog computing can be performed with cloud computing as a mechanism for visualization of Big Data. An example of visualization techniques for Big Data transmitted by devices connected by Internet of Things (IoT) is presented for real data from fatality analysis reporting system (FARS) managed by the National Highway Traffic Safety Administration (NHTSA) of the United States Department of Transportation (USDoT). Big Data web-based visualization software are discussed that are both JavaScript-based and user interface-based. Challenges and opportunities of using Big Data with fog computing are also discussed.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


2021 ◽  
Author(s):  
Fabian Kovacs ◽  
Max Thonagel ◽  
Marion Ludwig ◽  
Alexander Albrecht ◽  
Manuel Hegner ◽  
...  

BACKGROUND Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. OBJECTIVE The presented software solution Conquery is an open-source software tool providing advanced, but intuitive data analysis without the need for specialized statistical training. Conquery aims to simplify big data analysis for novice database users in the medical sector. METHODS Conquery is a document-oriented distributed timeseries database and analysis platform. Its main application is the analysis of per-person medical records by non-technical medical professionals. Complex analyses are realized in the Conquery frontend by dragging tree nodes into the query editor. Queries are evaluated by a bespoke distributed query-engine for medical records in a column-oriented fashion. We present a custom compression scheme to facilitate low response times that uses online calculated as well as precomputed metadata and data statistics. RESULTS Conquery allows for easy navigation through the hierarchy and enables complex study cohort construction whilst reducing the demand on time and resources. The UI of Conquery and a query output is exemplified by the construction of a relevant clinical cohort. CONCLUSIONS Conquery is an efficient and intuitive open-source software for performant and secure data analysis and aims at supporting decision-making processes in the healthcare sector.


Author(s):  
D. Berry

Open source software (OSS) is computer software that has its underlying source code made available under a licence. This can allow developers and users to adapt and improve it (Raymond, 2001). Computer software can be broadly split into two development models: • Proprietary, or closed software, owned by a company or individual. Copies of the binary are made public; the source code is not usually made public. • Open-source software (OSS), where the source code is released with the binary. Users and developers can be licenced to use and modify the code, and to distribute any improvements they make. Both OSS and proprietary approaches allow companies to make a profit. Companies developing proprietary software make money by developing software and then selling licences to use the software. For example, Microsoft receives a payment for every copy of Windows sold with a personal computer. OSS companies make their money by providing services, such as advising clients on the GPL licence. The licencee can either charge a fee for this service or work free of charge. In practice, software companies often develop both types of software. OSS is developed by an ongoing, iterative process where people share the ideas expressed in the source code. The aim is that a large community of developers and users can contribute to the development of the code, check it for errors and bugs, and make the improved version available to others. Project management software is used to allow developers to keep track of the various versions. There are two main types of open-source licences (although there are many variants and subtypes developed by other companies): • Berkeley Software Distribution (BSD) Licence: This permits a licencee to “close” a version (by withholding the most recent modifications to the source code) and sell it as a proprietary product; • GNU General Public Licence (GNU, GPL, or GPL): Under this licence, licencees may not “close” versions. The licencee may modify, copy, and redistribute any derivative version, under the same GPL licence. The licencee can either charge a fee for this service or work free of charge. Free software first evolved during the 1970s but in the 1990s forked into two movements, namely free software and open source (Berry, 2004). Richard Stallman, an American software developer who believes that sharing source code and ideas is fundamental to freedom of speech, developed a free version of the widely used Unix operating system. The resulting GNU program was released under a specially created General Public Licence (GNU, GPL). This was designed to ensure that the source code would remain openly available to all. It was not intended to prevent commercial usage or distribution (Stallman, 2002). This approach was christened free software. In this context, free meant that anyone could modify the software. However, the term “free” was often misunderstood to mean no cost. Hence, during the 1990s, Eric Raymond and others proposed that open-source software was coined as a less contentious and more business-friendly term. This has become widely accepted within the software and business communities; however there are still arguments about the most appropriate term to use (Moody, 2002). The OSMs are usually organised into a network of individuals who work collaboratively on the Internet, developing major software projects that sometimes rival commercial software but are always committed to the production of quality alternatives to those produced by commercial companies (Raymond, 2001; Williams, 2002). Groups and individuals develop software to meet their own and others’ needs in a highly decentralised way, likened to a Bazaar (Raymond, 2001). These groups often make substantive value claims to support their projects and foster an ethic of community, collaboration, deliberation, and intellectual freedom. In addition, it is argued by Lessig (1999) that the FLOSS community can offer an inspiration in their commitment to transparency in their products and their ability to open up governmental regulation and control through free/libre and open source code.


Author(s):  
Ricardo Oliveira ◽  
Rafael Moreno

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.


Sign in / Sign up

Export Citation Format

Share Document