Estimation of a Priori Decision Threshold for Collocations Extraction

Author(s):  
Fethi Fkih ◽  
Mohamed Nazih Omri

Choosing the optimal threshold for the collocations extraction remains a manual task performed by experts. Until today, there is no serious work, based on deep studies, which explores possible solutions to automate the learning of the threshold in the statistical terminology field. In this paper, the authors try to spotlight on this problem by exploring, firstly, the evaluation performance techniques used in several scientific areas (such as biomedical and biometric) and applying them, subsequently, on the statistical terminology field. The experimental study gives promoters results. First, it shows the effectiveness of usual techniques (such as ROC and Precision-Recall curves) used to evaluate the performance of binary classification systems. Second, it provides a practical solution for automatic estimation of optimal thresholds for collocation extraction systems.

2007 ◽  
Vol 56 (8) ◽  
pp. 95-106 ◽  
Author(s):  
P. Grau ◽  
S. Beltrán ◽  
M. de Gracia ◽  
E. Ayesa

This paper proposes a new methodology for the automatic characterization of the influent wastewater in WWTP. With this methodology, model components are automatically estimated by means of optimization algorithms combining a-priori knowledge of the expected wastewater composition with experimental information from the available measurement data. The characterization is carried out based on an extended model components list in which components are described by means their elemental mass fractions. This allows an easy establishment of relationships between model components with experimental data and also, to obtain a general methodology applicable to any model used for wastewater biological treatments. The characterization of the wastewater influent of Galindo-Bilbao according this methodology has demonstrated its validity and the easy application to the ASM1 model influent characterization.


2016 ◽  
Vol 139 (1) ◽  
Author(s):  
Brian Reding ◽  
Yiding Cao

Heat pipe technology offers a possible cooling technique for structures exposed to high heat fluxes, as in turbomachinery such as compressors and turbines. However, in its current configuration as single heat pipes, implementation of the technology is limited due to the difficulties in manufacturability and costs. Hence, a study to develop a new radially rotating (RR) heat pipe system was undertaken, which integrates multiple RR heat pipes with a common reservoir and interconnected braches for a more effective and practical solution to turbomachinery cooling. Experimental study has shown that the integration of multiple heat pipe branches with a reservoir at the top is feasible.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 55 ◽  
Author(s):  
Diego Martín ◽  
Borja Bordel ◽  
Ramón Alcarria

This paper addresses the problem of data aggregation platforms operating in heterogeneous Ambient Intelligence Environments. In these platforms, device interoperability is a challenge and erratic sensor observations are difficult to be detected. We propose ADES (Automatic Detection of Erratic Sensors), a statistical approach to detect erratic behavior in sensors and annotate those errors in a semantic platform. To do that, we propose three binary classification systems based on statistical tests for erratic observation detection, and we validate our approach by verifying whether ADES is able to classify sensors by its observations correctly. Results show that the first two classifiers (constant and random observations) had good accuracy rates, and they were able to classify most of the samples. In addition, all of the classifiers obtained a very low false positive rate.


2012 ◽  
Vol 21 (8) ◽  
pp. 1052 ◽  
Author(s):  
Yan Boulanger ◽  
Sylvie Gauthier ◽  
Philip. J. Burton ◽  
Marie-Andrée Vaillancourt

The ability of national and multipurpose ecological classification systems to provide an optimal zonation for a fire regime is questionable. Using wildfire (>1 ha) point data for the 1980–99 period, we defined zones with a homogeneous fire regime (HFR) across Canada and we assessed how these differ from the National Ecological Framework for Canada (NEFC) units of corresponding scale, i.e. ecoprovinces. Two HFR zonations were produced through spatially constrained clustering of (i) 1600-km2 cells and (ii) the smallest units of the NEFC system, i.e. ecodistricts, using attributes for natural and anthropogenic fires. Thirty-three HFR zones were identified. HFR zonations showed smaller differences among each other than with NEFC ecoprovinces. Comparisons with ecoprovinces suggested general agreement of generalised fire regime values with HFR zones but with poor zone boundary correspondence. Ecoprovince zonation led to an overgeneralisation of fire regime estimates with less variation captured than by the HFR zonations, especially that using gridded fixed-area cells. Estimates of fire-return interval strongly differed between a priori and HFR zonations. The use of large-scale NEFC units or a zonation using its smallest units may constrain our ability to accurately quantify and portray fire regime variability across the country. The alternative empirical HFR zonation using gridded cells refines the location and nature of fire risk.


Author(s):  
Agnivesh Pani ◽  
Prasanta K. Sahu

Freight demand models typically employ a priori classification systems for dividing establishments into hypothetically complementary groups with homogeneous patterns in freight production (FP) and freight trip production (FTP). Although an attractive and popular notion, the assumption of homogeneity within these a priori industrial classes is reductive in nature and is not yet tested in literature. This research examines this hypothesis and explores the possibility of a data-driven segmentation by examining the relationships between FP/FTP patterns and prevalent a priori classes; subsequently, it creates homogeneous ensembles of a posteriori segments through aggregation. This research labels, explains, and interprets these novel segments using commodity value density of industrial classes. The alternate segmentation schemes are compared in their ability to predict FP and FTP and it is found that: (i) industrial classification systems (NAICS, ISIC) perform significantly better than product classification systems (ASICC); (ii) a considerable portion of variability in FTP does not depend on employment predictor due to the underlying influence of shipment size; (iii) an a posteriori segmentation scheme considering shipment size may represent an effective middle ground for developing both FP and FTP models in freight demand model systems. Adoption of these novel segments of the freight travel market has the potential to reduce the sample size requirements of freight demand model systems and minimize the financial necessities for future freight surveys.


2021 ◽  
Vol 26 ◽  
pp. 489-504
Author(s):  
Anne Anderson ◽  
Shobha Ramalingam

‘Global Projects’ and ‘Global Virtual Teams’ are revolutionizing the construction industry. An increasing number of multi-national engineering firms are adopting this business model due to the possible advantages of cost and time optimization. However, literature identifies several challenges that the project teams endure in temporarily organizing while transitioning through time and space, some of which include cross-cultural differences in teams and limited richness of the communication media. Perceiving virtual project execution as a multi-variable construct, organizational theorists and sociologists adopt a socio-technical approach to understand the dynamics of action embedded in the process and recommend implementation of pre-process, during process or post process intervention strategies to enable performance. In this paper, we address this research concern through an experimental study conducted across two global universities, National Institute of Construction Management and Research, Pune, India and Washington State University, USA. Around 24 students from each university in ten teams collaborated virtually for a period of 2.5 weeks to develop a 3-dimensional Revit model and a 4-dimensional BIM model in Autodesk Revit and Navisworks, respectively, for a multi-storey residential building. The study aimed to investigate the role of project teams in organizing and coordinating projects tasks and taking a socio-technical approach, explored the role of a BIM Execution Plan as a pre-process intervention strategy. Data collected through qualitative survey post the experiment was qualitatively analyzed using ethnographic coding techniques. Findings showed that the project and team challenges primarily stemmed from coordination issues and institutional differences. Members significantly mitigated the issues through a proactive approach and a priori planning. The BIM Execution Plan allowed members to instantly get involved with the tasks and plan the process apart from being able to foresee the complexity. Teams emphasized the importance of implementing a detailed BIM Execution Plan during the planning phase for a collaborative and successful project outcome and further observed that pre-process intervention strategy such as a BIM plan was the needed impetus for members to collaborate and coordinate project tasks.


2019 ◽  
Vol 28 (1) ◽  
pp. 33-48 ◽  
Author(s):  
Daniel A. López ◽  
Maria J. Rojas ◽  
Boris A. López ◽  
Oscar Espinoza

Purpose The purpose of this study is to analyze the relationship between quality assurance, the traditional a priori approach, and a more recently developed empirical classification of universities, as a means of assessing whether the different classification systems fulfill their original purpose. The study analyzes Chilean university classifications because they have been used in setting up higher education public policies. Design/methodology/approach The existing classifications of Chilean universities were identified in the literature. Researchers determined categories, criteria and/or indicators used, as well as their main purposes as described by the authors of the classifications. All the criteria and indicators identified were directly related to the quality of academic activities and to the results of the university accreditation processes. The institutional accreditation outcomes and variables were studied using univariate and multivariate statistical analysis. Findings The a priori approach proved to be consistent with the results of institutional quality assurance, despite of the variability in individual performances. The empirical systems, however, do not show any contribution to the improvement of public policies in higher education. The results clearly show that classifications based on performance do not necessarily ensure improvements in institutional quality. Originality/value To the authors’ knowledge, this analysis is the first study of the relationship between university classification and quality assurance. The growing number of proposals for different empirical classifications in Chilean universities is evidence of institutional diversity only. However, the classification designs did not respond to purposes such as public policies improvements and other expected results from these instruments.


2021 ◽  
Vol 503 (3) ◽  
pp. 4446-4465
Author(s):  
Ting-Yun Cheng ◽  
Marc Huertas-Company ◽  
Christopher J Conselice ◽  
Alfonso Aragón-Salamanca ◽  
Brant E Robertson ◽  
...  

ABSTRACT We explore unsupervised machine learning for galaxy morphology analyses using a combination of feature extraction with a vector-quantized variational autoencoder (VQ-VAE) and hierarchical clustering (HC). We propose a new methodology that includes: (1) consideration of the clustering performance simultaneously when learning features from images; (2) allowing for various distance thresholds within the HC algorithm; (3) using the galaxy orientation to determine the number of clusters. This set-up provides 27 clusters created with this unsupervised learning that we show are well separated based on galaxy shape and structure (e.g. Sérsic index, concentration, asymmetry, Gini coefficient). These resulting clusters also correlate well with physical properties such as the colour–magnitude diagram, and span the range of scaling relations such as mass versus size amongst the different machine-defined clusters. When we merge these multiple clusters into two large preliminary clusters to provide a binary classification, an accuracy of $\sim 87{{\ \rm per\ cent}}$ is reached using an imbalanced data set, matching real galaxy distributions, which includes 22.7 per cent early-type galaxies and 77.3 per cent late-type galaxies. Comparing the given clusters with classic Hubble types (ellipticals, lenticulars, early spirals, late spirals, and irregulars), we show that there is an intrinsic vagueness in visual classification systems, in particular galaxies with transitional features such as lenticulars and early spirals. Based on this, the main result in this work is not how well our unsupervised method matches visual classifications and physical properties, but that the method provides an independent classification that may be more physically meaningful than any visually based ones.


Sign in / Sign up

Export Citation Format

Share Document