Choosing an Open Source License Based on Software Dependencies

DEBoost: A Python Library for Weighted Distance Ensembling in Machine Learning

10.20944/preprints202005.0354.v1 ◽

2020 ◽

Author(s):

Wei Hao Khoong

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Preprocessing ◽

Weighted Distance ◽

Open Source License ◽

Classification Tasks ◽

Python Package

In this paper, we introduce deboost, a Python library devoted to weighted distance ensembling of predictions for regression and classification tasks. Its backbone resides on the scikit-learn library for default models and data preprocessing functions. It offers flexible choices of models for the ensemble as long as they contain the predict method, like the models available from scikit-learn. deboost is released under the MIT open-source license and can be downloaded from the Python Package Index (PyPI) at https://pypi.org/project/deboost. The source scripts are also available on a GitHub repository at https://github.com/weihao94/DEBoost.

Download Full-text

Using R

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0002 ◽

2020 ◽

pp. 3-39

Author(s):

Bendix Carstensen

Keyword(s):

Open Source ◽

Epidemiological Studies ◽

Commercial Product ◽

Frequency Data ◽

Open Source License ◽

Free Open Source

This chapter discusses how the best way to learn R is to use it. One should start by using it as a simple calculator, and keep on exploring what one gets back by inspecting the size, shape, and content of what one creates. R is available from CRAN, the Comprehensive R Archive Network. A nice interface to R is RStudio, which is a commercial product, but RStudio has a free open source license that allows one to have a very good and handy interface to R for free, including the possibility of writing reports using Rmarkdown, Sweave, or knitr. The chapter then looks at the two main graphics systems used in R: base graphics, which is an integral part of any R distribution, and ggplot2 (gg referring to grammar of graphics). Data from large epidemiological studies are often summarized in the form of frequency data, which record the frequency of all possible combinations of values of the variables in the study.

Download Full-text

Open-Source License Violations of Binary Software at Large Scale

2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner.2019.8667977 ◽

2019 ◽

Author(s):

Muyue Feng ◽

Weixuan Mao ◽

Zimu Yuan ◽

Yang Xiao ◽

Gu Ban ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Open Source License

Download Full-text

Balance Trees Reveal Microbial Niche Differentiation

mSystems ◽

10.1128/msystems.00162-16 ◽

2017 ◽

Vol 2 (1) ◽

Cited By ~ 129

Author(s):

James T. Morton ◽

Jon Sanders ◽

Robert A. Quinn ◽

Daniel McDonald ◽

Antonio Gonzalez ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Open Source ◽

Niche Differentiation ◽

Difficult Problem ◽

Individual Species ◽

Rrna Gene ◽

Link Type ◽

Open Source License ◽

Gene Data

ABSTRACT By explicitly accounting for the compositional nature of 16S rRNA gene data through the concept of balances, balance trees yield novel biological insights into niche differentiation. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/biocore/gneiss . Advances in sequencing technologies have enabled novel insights into microbial niche differentiation, from analyzing environmental samples to understanding human diseases and informing dietary studies. However, identifying the microbial taxa that differentiate these samples can be challenging. These issues stem from the compositional nature of 16S rRNA gene data (or, more generally, taxon or functional gene data); the changes in the relative abundance of one taxon influence the apparent abundances of the others. Here we acknowledge that inferring properties of individual bacteria is a difficult problem and instead introduce the concept of balances to infer meaningful properties of subcommunities, rather than properties of individual species. We show that balances can yield insights about niche differentiation across multiple microbial environments, including soil environments and lung sputum. These techniques have the potential to reshape how we carry out future ecological analyses aimed at revealing differences in relative taxonomic abundances across different samples. IMPORTANCE By explicitly accounting for the compositional nature of 16S rRNA gene data through the concept of balances, balance trees yield novel biological insights into niche differentiation. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/biocore/gneiss . Author Video: An author video summary of this article is available.

Download Full-text

The GEM Global Active Faults Database

Earthquake Spectra ◽

10.1177/8755293020944182 ◽

2020 ◽

Vol 36 (1_suppl) ◽

pp. 160-180 ◽

Cited By ~ 2

Author(s):

Richard Styron ◽

Marco Pagani

Keyword(s):

Open Source ◽

Active Faults ◽

Slip Rate ◽

Comprehensive Database ◽

Open Source License ◽

Rate Information

The GEM Global Active Faults Database (GAF-DB) is the first public, comprehensive database of active faults with worldwide coverage. The GAF-DB is a compilation of many regional datasets. The GAF-DB contains ∼13,500 faults, each with associated attributes that describe the geometry, kinematics, slip rate, references, and other characteristics, as the information is available. Spatial completeness is high, and about 77% of the faults have slip rate information. The GAF-DB is built from its constituent datasets algorithmically and is designed to fluidly incorporate changes to or addition of any of the underlying datasets. This process reflects a philosophy of easily incorporating a change to avoid obsolescence and to quickly provide the most up-to-date information possible to the users. The database is licensed under a free and open-source license (CC-BY-SA 4.0) and is available at https://github.com/GEMScienceTools/gem-global-active-faults .

Download Full-text

The Design and Implement of Open Source License Tracking System

2010 International Conference on Computational Intelligence and Software Engineering ◽

10.1109/cise.2010.5676875 ◽

2010 ◽

Cited By ~ 2

Author(s):

HongBo Xu ◽

HuiHui Yang ◽

Dan Wan ◽

JiangPing Wan

Keyword(s):

Open Source ◽

Tracking System ◽

Open Source License

Download Full-text

Indexing Queries in Lux

Proceedings of Balisage: The Markup Conference 2013 ◽

10.4242/balisagevol10.sokolov01 ◽

2013 ◽

Cited By ~ 1

Author(s):

Michael Sokolov

Keyword(s):

Open Source ◽

Search Engine ◽

Inductive Logic ◽

Open Source License ◽

Query Optimizers ◽

The Veil ◽

Intuitive Grasp

Query optimizers often mystify database users: sometimes queries run quickly and sometimes they don’t. An intuitive grasp of what will work well in an optimizer is often gained only after trial, error, inductive logic (i.e. educated guessing), and sometimes propitiatory sacrifice. This paper tries to lift the veil by describing work on Lux, a new indexed XQuery search engine built using Saxon and Lucene, which is freely available under an open-source license. Lux optimizes queries by rewriting them as equivalent (but usually faster) indexed queries, so its results are easier for a user to understand than the abstract query plans produced by some optimizers. Lucene-based QName and path indexes prove useful in speeding up XQuery execution by Saxon.

Download Full-text