Matching Document Pairs using Multi-Feature Semantic Fusion Based on Knowledge Graph

Abstract Discriminating the homology and heterogeneity of two documents in information retrieval is very important and difficult step. Existing methods mainly focus on word-based document duplicate checking or sentence pairs matching except manual verification which need a lot of human resource cost. The word-based document duplicate checking can not judge the similarity of two documents from the semantic level and the matching sentence pair methods can not effectively mine the semantic information from a long text which is frequent retrieval results. A concept-based Multi-Feature Semantic Fusion Model (MFSFM) is proposed. It employs multi-feature enhanced semantics to construct a concept map for represent the document, and employs a multi-convolution mixed residual CNN module to introduce local attention mechanism for improve the sensitivity of conceptual boundary information. To improve the feasibility of the proposed MFSFM based on concept maps, two multi-feature document data sets are set up. Each of them consists of about 500 actual scientific and technological project feasibility reports. Experimental results based on the actual datasets show that the proposed MFSFM converges quickly while expanding the latest methods of natural language matching at the accuracy rate.

Download Full-text

Bayesian Trigonometric Support Vector Classifier

Neural Computation ◽

10.1162/089976603322297368 ◽

2003 ◽

Vol 15 (9) ◽

pp. 2227-2254 ◽

Cited By ~ 20

Author(s):

Wei Chu ◽

S. Sathiya Keerthi ◽

Chong Jin Ong

Keyword(s):

Loss Function ◽

Gaussian Processes ◽

Likelihood Function ◽

Support Vector ◽

Data Sets ◽

Model Adaptation ◽

Bayesian Techniques ◽

Benchmark Data ◽

Support Vector Classifier ◽

Set Up

This letter describes Bayesian techniques for support vector classification. In particular, we propose a novel differentiable loss function, called the trigonometric loss function, which has the desirable characteristic of natural normalization in the likelihood function, and then follow standard gaussian processes techniques to set up a Bayesian framework. In this framework, Bayesian inference is used to implement model adaptation, while keeping the merits of support vector classifier, such as sparseness and convex programming. This differs from standard gaussian processes for classification. Moreover, we put forward class probability in making predictions. Experimental results on benchmark data sets indicate the usefulness of this approach.

Download Full-text

Development of Reliable NARX Models of Gas Turbine Cold, Warm and Hot Start-Up

Volume 9: Oil and Gas Applications; Supercritical CO2 Power Cycles; Wind Energy ◽

10.1115/gt2017-63332 ◽

2017 ◽

Cited By ~ 2

Author(s):

Hilal Bahlawan ◽

Mirko Morini ◽

Michele Pinelli ◽

Pier Ruggero Spina ◽

Mauro Venturini

Keyword(s):

Gas Turbine ◽

Training Data ◽

Series Data ◽

Data Sets ◽

Control Logic ◽

Start Up ◽

Hot Start ◽

Narx Models ◽

Set Up ◽

Rapid Transients

This paper documents the set-up and validation of nonlinear autoregressive exogenous (NARX) models of a heavy-duty single-shaft gas turbine. The considered gas turbine is a General Electric PG 9351FA located in Italy. The data used for model training are time series data sets of several different maneuvers taken experimentally during the start-up procedure and refer to cold, warm and hot start-up. The trained NARX models are used to predict other experimental data sets and comparisons are made among the outputs of the models and the corresponding measured data. Therefore, this paper addresses the challenge of setting up robust and reliable NARX models, by means of a sound selection of training data sets and a sensitivity analysis on the number of neurons. Moreover, a new performance function for the training process is defined to weigh more the most rapid transients. The final aim of this paper is the set-up of a powerful, easy-to-build and very accurate simulation tool which can be used for both control logic tuning and gas turbine diagnostics, characterized by good generalization capability.

Download Full-text

FINE REGISTRATION OF KILO-STATION NETWORKS - A MODERN PROCEDURE FOR TERRESTRIAL LASER SCANNING DATA SETS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b5-485-2016 ◽

2016 ◽

Vol XLI-B5 ◽

pp. 485-492

Author(s):

J.-F. Hullo

Keyword(s):

Laser Scanning ◽

Laser Scanner ◽

Data Sets ◽

Data Set ◽

Information Models ◽

3D Network ◽

Laser Scanners ◽

Building Information ◽

Set Up ◽

Fine Registration

We propose a complete methodology for the fine registration and referencing of kilo-station networks of terrestrial laser scanner data currently used for many valuable purposes such as 3D as-built reconstruction of Building Information Models (BIM) or industrial asbuilt mock-ups. This comprehensive target-based process aims to achieve the global tolerance below a few centimetres across a 3D network including more than 1,000 laser stations spread over 10 floors. This procedure is particularly valuable for 3D networks of indoor congested environments. In situ, the use of terrestrial laser scanners, the layout of the targets and the set-up of a topographic control network should comply with the expert methods specific to surveyors. Using parametric and reduced Gauss-Helmert models, the network is expressed as a set of functional constraints with a related stochastic model. During the post-processing phase inspired by geodesy methods, a robust cost function is minimised. At the scale of such a data set, the complexity of the 3D network is beyond comprehension. The surveyor, even an expert, must be supported, in his analysis, by digital and visual indicators. In addition to the standard indicators used for the adjustment methods, including Baarda’s reliability, we introduce spectral analysis tools of graph theory for identifying different types of errors or a lack of robustness of the system as well as <i>in fine</i> documenting the quality of the registration.

Download Full-text

Measurements from mobile surface vehicles during LAPSE-RATE

10.5194/essd-2020-173 ◽

2020 ◽

Cited By ~ 1

Author(s):

Gijs de Boer ◽

Sean Waugh ◽

Alexander Erwin ◽

Steven Borenstein ◽

Cory Dixon ◽

...

Keyword(s):

Layer Structure ◽

Unmanned Aircraft ◽

Lapse Rate ◽

Data Sets ◽

San Luis Valley ◽

San Luis ◽

Remotely Piloted Aircraft ◽

In Situ Sensors ◽

Set Up ◽

The One

Abstract. Between 14 and 20 July 2018, small unmanned aircraft systems (sUAS) were deployed to the San Luis Valley of Colorado (USA) alongside surface-based remote, in-situ sensors, and radiosonde systems as part of the Lower Atmospheric Profiling Studies at Elevation – a Remotely-piloted Aircraft Team Experiment (LAPSE-RATE). The measurements collected as part of LAPSE-RATE targeted quantities related to enhancing our understanding of boundary layer structure, cloud and aerosol properties and surface-atmosphere exchange, and provide detailed information to support model evaluation and improvement work. Additionally, intensive intercomparison between the different unmanned aircraft platforms was completed. The current manuscript describes the observations obtained using three different types of surface-based mobile observing vehicles. These included the University of Colorado Mobile UAS Research Collaboratory (MURC), the National Oceanic and Atmospheric Administration National Severe Storms Laboratory Mobile Mesonet, and two University of Nebraska Combined Mesonet and Tracker (CoMeT) vehicles. Over the one-week campaign, a total of 143 hours of data were collected using this combination of vehicles. The data from these coordinated activities provide detailed perspectives on the spatial variability of atmospheric state parameters (air temperature, humidity, pressure, and wind) throughout the northern half of the San Luis Valley. These data sets have been checked for quality and published to the Zenodo data archive under a specific community set up for LAPSE-RATE (https://zenodo.org/communities/lapse-rate/) and are accessible at no cost by all registered users. The primary dataset DOIs are https://doi.org/10.5281/zenodo.3814765 (CU MURC measurements; de Boer et al., 2020d), https://doi.org/10.5281/zenodo.3738175 (NSSL MM measurements; Waugh, 2020) and https://doi.org/10.5281/zenodo.3838724 (UNL CoMeT measurements; Houston and Erwin., 2020).

Download Full-text

Classification of Pneumonia Cell Images Using Improved ResNet50 Model

Traitement du signal ◽

10.18280/ts.380117 ◽

2021 ◽

Vol 38 (1) ◽

pp. 165-173

Author(s):

Ahmet Çınar ◽

Muhammed Yıldırım ◽

Yeşim Eroğlu

Keyword(s):

Error Rates ◽

Lung Imaging ◽

Imaging Method ◽

Data Sets ◽

Accuracy Rate ◽

Traditional Methods ◽

X Ray ◽

Qualified Personnel ◽

Patient Will

Pneumonia is a disease caused by inflammation of the lung tissue that is transmitted by various means, primarily bacteria. Early and accurate diagnosis is important in reducing the morbidity and mortality of the disease. The primary imaging method used for the diagnosis of pneumonia is lung x-ray. While typical imaging findings of pneumonia may be present on lung imaging, nonspecific images may be present. In addition, many health units may not have qualified personnel to perform this procedure or there may be errors in diagnoses made by traditional methods. For this reason, computer systems can be used to prevent error rates that may occur in traditional methods. Many methods have been developed to train data sets. In this article, a new model has been developed based on the layers of the ResNet50. The developed model was compared with the architectures InceptionV3, AlexNet, GoogleNet, ResNet50 and DenseNet201. In the developed model, the maximum accuracy rate was achieved as 97.22%. The model developed was followed by DenseNet201, ResNet50, InceptionV3, GoogleNet and AlexNet, respectively, according to their accuracy. With these developed models, the diagnosis of pneumonia can be made early and accurately, and the treatment management of the patient will be determined quickly.

Download Full-text

Relationships between winter wheat yields and soil carbon under various tillage systems

Plant Soil and Environment ◽

10.17221/512/2012-pse ◽

2012 ◽

Vol 58 (No. 12) ◽

pp. 540-544 ◽

Cited By ~ 5

Author(s):

O. Mikanová ◽

T. Šimon ◽

M. Javůrek ◽

M. Vach

Keyword(s):

Winter Wheat ◽

Field Experiments ◽

Conservation Tillage ◽

Conventional Tillage ◽

Data Sets ◽

No Tillage ◽

Organic C ◽

Grain Yields ◽

Soil Microbial ◽

Set Up

 Soil quality and fertility are associated with its productivity, and this in turn is connected to the soil biological activity. To study these effects, well designed long-term field experiments that provide comprehensive data sets are the most applicable. Four treatments (tillage methods) were set up: (1) conventional tillage (CT); (2) no tillage (NT); (3) minimum tillage + straw (MTS), and (4) no tillage + mulch (NTM). Our objective was to assess the relationships between soil microbial characteristics and winter wheat yields under these different techniques of conservation tillage within a field experiment, originally established in 1995. The differences in average grain yields over time period 2002–2009 between the variants were not statistically significant. Organic carbon in the topsoil was higher in plots with conservation tillage (NT, MTS, and NTM), than in the conventional tillage plots. There was a statistically significant correlation (P ≤ 0.01) between the grain yields and organic C content in topsoil.  

Download Full-text

Research Policy and Review 29: The Chorley Committee and “Handling Geographic Information”

Environment and Planning A Economy and Space ◽

10.1068/a210571 ◽

1989 ◽

Vol 21 (5) ◽

pp. 571-585 ◽

Cited By ~ 6

Author(s):

D W Rhind ◽

H M Mounsey

Keyword(s):

Geographical Information Systems ◽

Research Policy ◽

Geographic Information ◽

Geographical Information ◽

Data Sets ◽

Ordnance Survey ◽

Information Technology Education ◽

Training Research ◽

Set Up ◽

The Uk

In 1985, the UK government set up a Committee of Enquiry into the Handling of Geographic Information by computer. This was chaired by Lord Chorley and reported in early 1987. It concerned itself with all information which is described in relation to space and which could hence be used either singly or in combination. The tasks undertaken by the Committee are described, as are its composition and method of operation, the major ‘discoveries’ it made, and the recommendations put forward to government. A total of sixty-four recommendations were made covering digital (especially Ordnance Survey) topographic mapping, the availability of geographically disaggregated data, the problems and benefits of linking different data sets together, the need to enhance user awareness of geographical information systems and information technology, education and training, research and development, and the appropriate role for government and machinery for coordination. Finally, the government's published response to the Chorley Report is examined, particularly with regard to the proposed Centre for Geographic Information. The subsequent moves towards a consortium to bring this about are described.

Download Full-text

Measurements of diapycnal diffusivities in stratified fluids

Journal of Fluid Mechanics ◽

10.1017/s0022112001005080 ◽

2001 ◽

Vol 442 ◽

pp. 267-291 ◽

Cited By ~ 78

Author(s):

MICHAEL E. BARRY ◽

GREGORY N. IVEY ◽

KRAIG B. WINTERS ◽

JÖRG IMBERGER

Keyword(s):

Root Mean Square ◽

Buoyancy Frequency ◽

Data Sets ◽

Laboratory System ◽

Mean Square ◽

Wide Range ◽

Vertical Grid ◽

Order Of Magnitude ◽

Set Up ◽

Rate Of Dissipation

Linearly stratified salt solutions of different Prandtl number were subjected to turbulent stirring by a horizontally oscillating vertical grid in a closed laboratory system. The experimental set-up allowed the independent direct measurement of a root mean square turbulent lengthscale Lt, turbulent diffusivity for mass Kρ, rate of dissipation of turbulent kinetic energy ε, buoyancy frequency N and viscosity v, as time and volume averaged quantities. The behaviour of both Lt and Kρ was characterized over a wide range of the turbulence intensity measure, ε/vN2, and two regimes were identified.In the more energetic of these regimes (Regime E, where 300 < ε/vN2 < 105), Lt was found to be a function of v, κ and N, whilst Kρ was a function of v, κ and (ε/vN2)1/3. From these expressions for Lt and Kρ, a scaling relation for the root mean square turbulent velocity scale Ut was derived, and this relationship showed good agreement with direct measurements from other data sets.In the weaker turbulence regime (Regime W, where 10 < ε/vN2 < 300) Kρ was a function of v, κ and ε/vN2.For 10 < ε/vN2 < 1000, our directly measured diffusivities, Kρ, are approximately a factor of 2 different to the diffusivity predicted by the model of Osborn (1980). For ε/vN2 > 1000, our measured diffusivities diverge from the model prediction. For example, at ε/vN2 ≈ 104 there is at least an order of magnitude difference between the measured and predicted diffusivities.

Download Full-text

Rosa Taxonomy and Hierarchy of Markers Defined by ACT STATIS

Zeitschrift für Naturforschung C ◽

10.1515/znc-1999-1-206 ◽

1999 ◽

Vol 54 (1-2) ◽

pp. 25-34 ◽

Cited By ~ 3

Author(s):

C. Grossi ◽

O. Raymond ◽

C. Sanlaville-Boisson ◽

M. Jay

Keyword(s):

Superoxide Dismutase ◽

Morphological Features ◽

Morphological Data ◽

Data Sets ◽

Common View ◽

Correlation Studies ◽

Set Up ◽

The Rose ◽

Rosa Species

The ACT STATIS method, a multi-table comparison, was applied to 62 Rosa species to be clustered into four sections (Carolinae, Cinnamomeae, Pimpinellifoliae and Synstylae); the data sets were dealing with morphology (15 criteria), anthocyanin pattern (10 compounds), flavonol heteroside pattern (26 compounds) and superoxide dismutase isozyme (SOD) polymorphism (11 bands). This method appeared very powerful to recognize the rose sections and to set up a marker hierarchy which places at the first level the flavonol hetero side pattern, then the morphological data, the SOD isozyme data and finally the anthocyanin pattern. The correlation studies between the markers underlined the relatively common view by means of flavonol patterns and the morphological features

Download Full-text

Robust Federated Learning via Collaborative Machine Teaching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5826 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4075-4082

Author(s):

Yufei Han ◽

Xiangliang Zhang

Keyword(s):

Teaching And Learning ◽

Service Providers ◽

Teaching Method ◽

Training Data ◽

Data Sets ◽

Quality Of Data ◽

Training Set ◽

In The Wild ◽

Model Training ◽

Set Up

For federated learning systems deployed in the wild, data flaws hosted on local agents are widely witnessed. On one hand, given a large amount (e.g. over 60%) of training data are corrupted by systematic sensor noise and environmental perturbations, the performances of federated model training can be degraded significantly. On the other hand, it is prohibitively expensive for either clients or service providers to set up manual sanitary checks to verify the quality of data instances. In our study, we echo this challenge by proposing a collaborative and privacy-preserving machine teaching method. Specifically, we use a few trusted instances provided by teachers as benign examples in the teaching process. Our collaborative teaching approach seeks jointly the optimal tuning on the distributed training set, such that the model learned from the tuned training set predicts labels of the trusted items correctly. The proposed method couples the process of teaching and learning and thus produces directly a robust prediction model despite the extremely pervasive systematic data corruption. The experimental study on real benchmark data sets demonstrates the validity of our method.

Download Full-text