scholarly journals Bootstrapping a Persian Dependency Treebank

2012 ◽  
Vol 7 ◽  
Author(s):  
Mojgan Seraji ◽  
Beáta Megyesi ◽  
Joakim Nivre

This paper presents an ongoing project whose goal is to create a freely available dependency treebank for Persian. The data is taken from the Bijankhan corpus, which is already annotated for parts of speech, and a syntactic dependency annotation based on the Stanford Typed Dependencies is added through a bootstrapping procedure involving the open-source dependency parser MaltParser. We report preliminary parsing experiments with promising results after training the parser on a manually annotated seed data set of 215 sentences.

2014 ◽  
Author(s):  
David S Smith ◽  
Xia Li ◽  
Lori R Arlinghaus ◽  
Thomas E Yankeelov ◽  
E. Brian Welch

We present a fast, validated, open-source toolkit for processing dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data. We validate it against the Quantitative Imaging Biomarkers Alliance (QIBA) Standard and Extended Tofts-Kety phantoms and find near perfect recovery in the absence of noise, with an estimated 10-20x speedup in run time compared to existing tools. To explain the observed trends in the fitting errors, we present an argument about the conditioning of the Jacobian in the limit of small and large parameter values. We also demonstrate its use on an in vivo data set to measure performance on a realistic application. For a 192 x 192 breast image, we achieved run times of < 1 s. Finally, we analyze run times scaling with problem size and find that the run time per voxel scales as O(N1.9), where N is the number of time points in the tissue concentration curve. DCEMRI.jl was much faster than any other analysis package tested and produced comparable accuracy, even in the presence of noise.


2021 ◽  
Author(s):  
Soran Nouri

Within the Open Source Software (OSS) literature, there is a lack of studies addressing the legitimation processes of innovations that are born in OSS. This study sets out to analyze the legitimation processes of innovations within the deliberations of the Drupal project. The data set constitutes 52 rational deliberation cases discussing innovations that were proposed by members of the community. Habermas’s Ideal Speech Situations (ISS) is used as the framework to view Drupal’s rational deliberations from; in fact within the 52 cases that are examined in this thesis, there were no violations to the guidelines of the ISS in the deliberations. The Communicative Action Theory, Influence Tactics theory and the theory of Validity Claims are aspects of the framework that is used to code and analyze the conversations. These aspects allow for an effective conceptualization of the dynamics of the Drupal deliberations. This thesis was able to find that legitimation processes of innovations in open source software were influenced by the type, complexity and implications of the innovations on the rest of the community. Also, bug fixes, complex innovations and innovations that have implications on the rest of the software will result in a long (in terms of number of comments) legitimation process. Also, it is empirically backed in this study that in open deliberations that aim at achieving mutual understanding towards a common goal, the communicative action type and the rational persuasion influence tactic are the most common methods for innovators to interact with the community.


Author(s):  
FATEMA N. JULIA ◽  
KHAN M. IFTEKHARUDDIN ◽  
ATIQ U. ISLAM

Dialog act (DA) classification is useful to understand the intentions of a human speaker. An effective classification of DA can be exploited for realistic implementation of expert systems. In this work, we investigate DA classification using both acoustic and discourse information for HCRC MapTask data. We extract several different acoustic features and exploit these features using a Hidden Markov Model (HMM) network to classify acoustic information. For discourse feature extraction, we propose a novel parts-of-speech (POS) tagging technique that effectively reduces the dimensionality of discourse features. To classify discourse information, we exploit two classifiers such as a HMM and Support Vector Machine (SVM). We further obtain classifier fusion between HMM and SVM to improve discourse classification. Finally, we perform an efficient decision-level classifier fusion for both acoustic and discourse information to classify 12 different DAs in MapTask data. We obtain 65.2% and 55.4% DA classification rates using acoustic and discourse information, respectively. Furthermore, we obtain combined accuracy of 68.6% for DA classification using both acoustic and discourse information. These accuracy rates of DA classification are either comparable or better than previously reported results for the same data set. For average precision and recall, we obtain accuracy rates of 74.89% and 69.83%, respectively. Therefore, we obtain much better precision and recall rates for most of the classified DAs when compared to existing works on the same HCRC MapTask data set.


Author(s):  
Ricardo Oliveira ◽  
Rafael Moreno

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.


Author(s):  
Ricardo Oliveira ◽  
Rafael Moreno

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.


10.29007/wkj6 ◽  
2019 ◽  
Author(s):  
Vasu Saluja ◽  
Minni Jain ◽  
Prakarsh Yadav

Here we propose an open source algorithm, L,M&amp;A(Lyrics, Mine and Analyse) to create a dataset of lyrics of the works of various artists. The aim of this approach is to facilitate the generation of a large data set that can be used for improving accuracy of song recommendation algorithms. The limited availability of such datasets has excluded the sentiment analysis of lyrics from music recommendation systems. By using the L,M&amp;A algorithm, it is possible to generate a large dataset which can function as training dataset for future classifier systems. We have used iterative API requests from musixmatch and Genius servers to text mine lyrics data of songs by multiple artists. The data is processed and then analysed for sentiment using lexicons provided in the Tidytext package (BING, AFINN, NRC) and the overall sentiment of artist was determined through modal counts. The occurrence of each sentiments was evaluated and visualized using ggplot2. This representation exhibits the merit of our approach and the applicability of our data. The key feature of our approach is the open source platforms utilized and simplicity of input required from user.


Author(s):  
S. M. Stuit ◽  
C. L. E. Paffen ◽  
S. Van der Stigchel

AbstractMany studies use different categories of images to define their conditions. Since any difference between these categories is a valid candidate to explain category-related behavioral differences, knowledge about the objective image differences between categories is crucial for the interpretation of the behaviors. However, natural images vary in many image features and not every feature is equally important in describing the differences between the categories. Here, we provide a methodological approach to find as many of the image features as possible, using machine learning performance as a tool, that have predictive value over the category the images belong to. In other words, we describe a means to find the features of a group of images by which the categories can be objectively and quantitatively defined. Note that we are not aiming to provide a means for the best possible decoding performance; instead, our aim is to uncover prototypical characteristics of the categories. To facilitate the use of this method, we offer an open-source, MATLAB-based toolbox that performs such an analysis and aids the user in visualizing the features of relevance. We first applied the toolbox to a mock data set with a ground truth to show the sensitivity of the approach. Next, we applied the toolbox to a set of natural images as a more practical example.


2016 ◽  
Author(s):  
Tiziana De Filippis ◽  
Leandro Rocchi ◽  
Patrizio Vignaroli ◽  
Maurizio Bacci ◽  
Vieri Tarchiani ◽  
...  

In Sub-Saharan Africa analysis tools and models based on meteorological satellites data have been developed within different national and international cooperation initiatives, with the aim of allowing a better monitoring of the cropping season. In most cases, the software was a stand-alone application and the upgrading, in terms of analysis functions, database and hardware maintenance, was difficult for the National Meteorological Services (NMSs) in charge of agro-hydro-meteorological monitoring. The web-based solution proposed in this work intends to improve and ensure the sustainability of applications to support national Early Warning Systems (EWSs) for food security. The Crop Risk Zones (CRZ) model for Niger and Mali, integrated in a web-based open source framework, has been implemented using PL/pgSQL & PostGIS functions to process different meteorological data sets: a) the rainfall precipitation forecast images from Global Forecast System (GFS) b) the Climate Prediction Center (CPC) Rainfall Estimation (RFE) for Africa c) Multi-Sensor Precipitation Estimate (MPE) images from EUMETSAT Earth Observation Portal d) the MOD16 Global Terrestrial Evapotranspiration Data Set. Restful Web Services upload raster images into the PostgreSQL/PostGIS database. PL/pgSQL functions are used to run the CRZ model to identify installation and phenological phases of the main crops in the Region and to create crop risk zones images. This model is focused on the early identification of risks and the production of information for food security within the time prescribed for decision-making. The challenge and the objective of this work is to set up an open access monitoring system, based on meteorological open data providers, targeting NMSs and any other local decision makers for drought risk reduction and resilience improvement.


Author(s):  
Stefan Koch

In this chapter, we propose for the first time a method to compare the efficiency of free and open source projects, based on the data envelopment analysis (DEA) methodology. DEA offers several advantages in this context, as it is a non-parametric optimization method without any need for the user to define any relations between different factors or a production function, can account for economies or diseconomies of scale, and is able to deal with multi-input, multi-output systems in which the factors have different scales. Using a data set of 43 large F/OS projects retrieved from SourceForge.net, we demonstrate the application of DEA, and show that DEA indeed is usable for comparing the efficiency of projects. We will also show additional analyses based on the results, exploring whether the inequality in work distribution within the projects, the licensing scheme or the intended audience have an effect on their efficiency. As this is a first attempt at using this method for F/OS projects, several future research directions are possible. These include additional work on determining input and output factors, comparisons within application areas, and comparison to commercial or mixed-mode development projects.


Author(s):  
A. Lemme ◽  
Y. Meirovitch ◽  
M. Khansari-Zadeh ◽  
T. Flash ◽  
A. Billard ◽  
...  

AbstractThis paper introduces a benchmark framework to evaluate the performance of reaching motion generation approaches that learn from demonstrated examples. The system implements ten different performance measures for typical generalization tasks in robotics using open source MATLAB software. Systematic comparisons are based on a default training data set of human motions, which specify the respective ground truth. In technical terms, an evaluated motion generation method needs to compute velocities, given a state provided by the simulation system. This however is agnostic to how this is done by the method or how the methods learns from the provided demonstrations. The framework focuses on robustness, which is tested statistically by sampling from a set of perturbation scenarios. These perturbations interfere with motion generation and challenge its generalization ability. The benchmark thus helps to identify the strengths and weaknesses of competing approaches, while allowing the user the opportunity to configure the weightings between different measures.


Sign in / Sign up

Export Citation Format

Share Document