Bootstrapping a Persian Dependency Treebank

We present a fast, validated, open-source toolkit for processing dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data. We validate it against the Quantitative Imaging Biomarkers Alliance (QIBA) Standard and Extended Tofts-Kety phantoms and find near perfect recovery in the absence of noise, with an estimated 10-20x speedup in run time compared to existing tools. To explain the observed trends in the fitting errors, we present an argument about the conditioning of the Jacobian in the limit of small and large parameter values. We also demonstrate its use on an in vivo data set to measure performance on a realistic application. For a 192 x 192 breast image, we achieved run times of < 1 s. Finally, we analyze run times scaling with problem size and find that the run time per voxel scales as O(N1.9), where N is the number of time points in the tissue concentration curve. DCEMRI.jl was much faster than any other analysis package tested and produced comparable accuracy, even in the presence of noise.

Download Full-text

An Analysis of the Dynamics of the Legitimation Processes of Innovations in Open Source Software:

10.32920/ryerson.14660454.v1 ◽

2021 ◽

Author(s):

Soran Nouri

Keyword(s):

Open Source ◽

Open Source Software ◽

Action Theory ◽

Communicative Action ◽

Data Set ◽

Rational Persuasion ◽

Validity Claims ◽

Action Type ◽

Influence Tactic ◽

Bug Fixes

Within the Open Source Software (OSS) literature, there is a lack of studies addressing the legitimation processes of innovations that are born in OSS. This study sets out to analyze the legitimation processes of innovations within the deliberations of the Drupal project. The data set constitutes 52 rational deliberation cases discussing innovations that were proposed by members of the community. Habermas’s Ideal Speech Situations (ISS) is used as the framework to view Drupal’s rational deliberations from; in fact within the 52 cases that are examined in this thesis, there were no violations to the guidelines of the ISS in the deliberations. The Communicative Action Theory, Influence Tactics theory and the theory of Validity Claims are aspects of the framework that is used to code and analyze the conversations. These aspects allow for an effective conceptualization of the dynamics of the Drupal deliberations. This thesis was able to find that legitimation processes of innovations in open source software were influenced by the type, complexity and implications of the innovations on the rest of the community. Also, bug fixes, complex innovations and innovations that have implications on the rest of the software will result in a long (in terms of number of comments) legitimation process. Also, it is empirically backed in this study that in open deliberations that aim at achieving mutual understanding towards a common goal, the communicative action type and the rational persuasion influence tactic are the most common methods for innovators to interact with the community.

Download Full-text

DIALOG ACT CLASSIFICATION USING ACOUSTIC AND DISCOURSE INFORMATION OF MAPTASK DATA

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026810002926 ◽

2010 ◽

Vol 09 (04) ◽

pp. 289-311 ◽

Cited By ~ 2

Author(s):

FATEMA N. JULIA ◽

KHAN M. IFTEKHARUDDIN ◽

ATIQ U. ISLAM

Keyword(s):

Classifier Fusion ◽

Support Vector ◽

Acoustic Features ◽

Average Precision ◽

Data Set ◽

Parts Of Speech ◽

Pos Tagging ◽

Accuracy Rates ◽

Better Than

Dialog act (DA) classification is useful to understand the intentions of a human speaker. An effective classification of DA can be exploited for realistic implementation of expert systems. In this work, we investigate DA classification using both acoustic and discourse information for HCRC MapTask data. We extract several different acoustic features and exploit these features using a Hidden Markov Model (HMM) network to classify acoustic information. For discourse feature extraction, we propose a novel parts-of-speech (POS) tagging technique that effectively reduces the dimensionality of discourse features. To classify discourse information, we exploit two classifiers such as a HMM and Support Vector Machine (SVM). We further obtain classifier fusion between HMM and SVM to improve discourse classification. Finally, we perform an efficient decision-level classifier fusion for both acoustic and discourse information to classify 12 different DAs in MapTask data. We obtain 65.2% and 55.4% DA classification rates using acoustic and discourse information, respectively. Furthermore, we obtain combined accuracy of 68.6% for DA classification using both acoustic and discourse information. These accuracy rates of DA classification are either comparable or better than previously reported results for the same data set. For average precision and recall, we obtain accuracy rates of 74.89% and 69.83%, respectively. Therefore, we obtain much better precision and recall rates for most of the classified DAs when compared to existing works on the same HCRC MapTask data set.

Download Full-text

HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-939-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 939-940

Author(s):

Ricardo Oliveira ◽

Rafael Moreno

Keyword(s):

Open Source ◽

Open Source Software ◽

Spatial Information ◽

Open Data ◽

Federal State ◽

Data Sets ◽

Data Set ◽

Geospatial Datasets ◽

State And Local ◽

The City

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.

Download Full-text

HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-939-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 939-940 ◽

Cited By ~ 1

Author(s):

Ricardo Oliveira ◽

Rafael Moreno

Keyword(s):

Open Source ◽

Open Source Software ◽

Spatial Information ◽

Open Data ◽

Federal State ◽

Data Sets ◽

Data Set ◽

Geospatial Datasets ◽

State And Local ◽

The City

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.

Download Full-text

L,M&A: An Algorithm for Music Lyrics Mining and Sentiment Analysis

10.29007/wkj6 ◽

2019 ◽

Author(s):

Vasu Saluja ◽

Minni Jain ◽

Prakarsh Yadav

Keyword(s):

Sentiment Analysis ◽

Open Source ◽

Large Data ◽

Training Dataset ◽

Classifier Systems ◽

A Algorithm ◽

Data Set ◽

Recommendation Algorithms ◽

Improving Accuracy ◽

Music Lyrics

Here we propose an open source algorithm, L,M&A(Lyrics, Mine and Analyse) to create a dataset of lyrics of the works of various artists. The aim of this approach is to facilitate the generation of a large data set that can be used for improving accuracy of song recommendation algorithms. The limited availability of such datasets has excluded the sentiment analysis of lyrics from music recommendation systems. By using the L,M&A algorithm, it is possible to generate a large dataset which can function as training dataset for future classifier systems. We have used iterative API requests from musixmatch and Genius servers to text mine lyrics data of songs by multiple artists. The data is processed and then analysed for sentiment using lexicons provided in the Tidytext package (BING, AFINN, NRC) and the overall sentiment of artist was determined through modal counts. The occurrence of each sentiments was evaluated and visualized using ggplot2. This representation exhibits the merit of our approach and the applicability of our data. The key feature of our approach is the open source platforms utilized and simplicity of input required from user.

Download Full-text

Introducing the Prototypical Stimulus Characteristics Toolbox: Protosc

Behavior Research Methods ◽

10.3758/s13428-021-01737-9 ◽

2021 ◽

Author(s):

S. M. Stuit ◽

C. L. E. Paffen ◽

S. Van der Stigchel

Keyword(s):

Machine Learning ◽

Open Source ◽

Predictive Value ◽

Methodological Approach ◽

Ground Truth ◽

Image Features ◽

Natural Images ◽

Learning Performance ◽

Behavioral Differences ◽

Data Set

AbstractMany studies use different categories of images to define their conditions. Since any difference between these categories is a valid candidate to explain category-related behavioral differences, knowledge about the objective image differences between categories is crucial for the interpretation of the behaviors. However, natural images vary in many image features and not every feature is equally important in describing the differences between the categories. Here, we provide a methodological approach to find as many of the image features as possible, using machine learning performance as a tool, that have predictive value over the category the images belong to. In other words, we describe a means to find the features of a group of images by which the categories can be objectively and quantitatively defined. Note that we are not aiming to provide a means for the best possible decoding performance; instead, our aim is to uncover prototypical characteristics of the categories. To facilitate the use of this method, we offer an open-source, MATLAB-based toolbox that performs such an analysis and aids the user in visualizing the features of relevance. We first applied the toolbox to a mock data set with a ground truth to show the sensitivity of the approach. Next, we applied the toolbox to a set of natural images as a more practical example.

Download Full-text

Open source geoprocessing tools and meteorological satellite data for crop risk zones monitoring in Sub-Saharan Africa

10.7287/peerj.preprints.2265v2 ◽

2016 ◽

Author(s):

Tiziana De Filippis ◽

Leandro Rocchi ◽

Patrizio Vignaroli ◽

Maurizio Bacci ◽

Vieri Tarchiani ◽

...

Keyword(s):

Food Security ◽

Open Source ◽

Early Warning Systems ◽

Sub Saharan Africa ◽

Precipitation Forecast ◽

Drought Risk ◽

Data Set ◽

Web Based ◽

Risk Zones ◽

Sub Saharan

In Sub-Saharan Africa analysis tools and models based on meteorological satellites data have been developed within different national and international cooperation initiatives, with the aim of allowing a better monitoring of the cropping season. In most cases, the software was a stand-alone application and the upgrading, in terms of analysis functions, database and hardware maintenance, was difficult for the National Meteorological Services (NMSs) in charge of agro-hydro-meteorological monitoring. The web-based solution proposed in this work intends to improve and ensure the sustainability of applications to support national Early Warning Systems (EWSs) for food security. The Crop Risk Zones (CRZ) model for Niger and Mali, integrated in a web-based open source framework, has been implemented using PL/pgSQL & PostGIS functions to process different meteorological data sets: a) the rainfall precipitation forecast images from Global Forecast System (GFS) b) the Climate Prediction Center (CPC) Rainfall Estimation (RFE) for Africa c) Multi-Sensor Precipitation Estimate (MPE) images from EUMETSAT Earth Observation Portal d) the MOD16 Global Terrestrial Evapotranspiration Data Set. Restful Web Services upload raster images into the PostgreSQL/PostGIS database. PL/pgSQL functions are used to run the CRZ model to identify installation and phenological phases of the main crops in the Region and to create crop risk zones images. This model is focused on the early identification of risks and the production of information for food security within the time prescribed for decision-making. The challenge and the objective of this work is to set up an open access monitoring system, based on meteorological open data providers, targeting NMSs and any other local decision makers for drought risk reduction and resilience improvement.

Download Full-text

Measuring the Efficiency of Free and Open Source Software Projects Using Data Envelopment Analysis

Emerging Free and Open Source Software Practices ◽

10.4018/978-1-59904-210-7.ch002 ◽

2007 ◽

pp. 25-46

Author(s):

Stefan Koch

Keyword(s):

Data Envelopment Analysis ◽

Open Source ◽

Optimization Method ◽

Future Research ◽

Data Envelopment ◽

Software Projects ◽

Data Set ◽

Output Factors ◽

Future Research Directions ◽

Using Data

In this chapter, we propose for the first time a method to compare the efficiency of free and open source projects, based on the data envelopment analysis (DEA) methodology. DEA offers several advantages in this context, as it is a non-parametric optimization method without any need for the user to define any relations between different factors or a production function, can account for economies or diseconomies of scale, and is able to deal with multi-input, multi-output systems in which the factors have different scales. Using a data set of 43 large F/OS projects retrieved from SourceForge.net, we demonstrate the application of DEA, and show that DEA indeed is usable for comparing the efficiency of projects. We will also show additional analyses based on the results, exploring whether the inequality in work distribution within the projects, the licensing scheme or the intended audience have an effect on their efficiency. As this is a first attempt at using this method for F/OS projects, several future research directions are possible. These include additional work on determining input and output factors, comparisons within application areas, and comparison to commercial or mixed-mode development projects.

Download Full-text

Open-source benchmarking for learned reaching motion generation in robotics

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2015-0002 ◽

2015 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

A. Lemme ◽

Y. Meirovitch ◽

M. Khansari-Zadeh ◽

T. Flash ◽

A. Billard ◽

...

Keyword(s):

Open Source ◽

Performance Measures ◽

Ground Truth ◽

Simulation System ◽

Training Data ◽

Motion Generation ◽

Data Set ◽

Generalization Ability ◽

Human Motions ◽

Technical Terms

AbstractThis paper introduces a benchmark framework to evaluate the performance of reaching motion generation approaches that learn from demonstrated examples. The system implements ten different performance measures for typical generalization tasks in robotics using open source MATLAB software. Systematic comparisons are based on a default training data set of human motions, which specify the respective ground truth. In technical terms, an evaluated motion generation method needs to compute velocities, given a state provided by the simulation system. This however is agnostic to how this is done by the method or how the methods learns from the provided demonstrations. The framework focuses on robustness, which is tested statistically by sampling from a set of perturbation scenarios. These perturbations interfere with motion generation and challenge its generalization ability. The benchmark thus helps to identify the strengths and weaknesses of competing approaches, while allowing the user the opportunity to configure the weightings between different measures.

Download Full-text