Semi-supervised learning with summary statistics

Nowadays, the extensive collection and analyzing of data is stimulating widespread privacy concerns, and therefore is increasing tensions between the potential sources of data and researchers. A privacy-friendly learning framework can help to ease the tensions, and to free up more data for research. We propose a new algorithm, LESS (Learning with Empirical feature-based Summary statistics from Semi-supervised data), which uses only summary statistics instead of raw data for regression learning. The selection of empirical features serves as a trade-off between prediction precision and the protection of privacy. We show that LESS achieves the minimax optimal rate of convergence in terms of the size of the labeled sample. LESS extends naturally to the applications where data are separately held by different sources. Compared with the existing literature on distributed learning, LESS removes the restriction of minimum sample size on single data sources.

Download Full-text

Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation

Intelligent Data Analysis ◽

10.3233/ida-205388 ◽

2021 ◽

Vol 25 (4) ◽

pp. 1013-1029

Author(s):

Zeeshan Zeeshan ◽

Qurat ul Ain ◽

Uzair Aslam Bhatti ◽

Waqar Hussain Memon ◽

Sajid Ali ◽

...

Keyword(s):

Recommendation System ◽

Sources Of Information ◽

Multiple Features ◽

Recommendation Algorithms ◽

Label Correlations ◽

Feature Based ◽

Multicriteria Model ◽

Implicit And Explicit ◽

Hybrid Recommendation ◽

Different Sources

With the increase of online businesses, recommendation algorithms are being researched a lot to facilitate the process of using the existing information. Such multi-criteria recommendation (MCRS) helps a lot the end-users to attain the required results of interest having different selective criteria – such as combinations of implicit and explicit interest indicators in the form of ranking or rankings on different matched dimensions. Current approaches typically use label correlation, by assuming that the label correlations are shared by all objects. In real-world tasks, however, different sources of information have different features. Recommendation systems are more effective if being used for making a recommendation using multiple criteria of decisions by using the correlation between the features and items content (content-based approach) or finding a similar user rating to get targeted results (Collaborative filtering). To combine these two filterings in the multicriteria model, we proposed a features-based fb-knn multi-criteria hybrid recommendation algorithm approach for getting the recommendation of the items by using multicriteria features of items and integrating those with the correlated items found in similar datasets. Ranks were assigned to each decision and then weights were computed for each decision by using the standard deviation of items to get the nearest result. For evaluation, we tested the proposed algorithm on different datasets having multiple features of information. The results demonstrate that proposed fb-knn is efficient in different types of datasets.

Download Full-text

Selection of M-dwarfs using Gaia, WISE, and 2MASS

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2823 ◽

2019 ◽

Vol 490 (3) ◽

pp. 4107-4120

Author(s):

J Bentley ◽

C G Tinney ◽

S Sharma ◽

D Wright

Keyword(s):

Absolute Magnitude ◽

Contamination Rate ◽

M Dwarfs ◽

Sky Survey ◽

Model Predictions ◽

Galaxy Model ◽

M Dwarf ◽

Different Sources ◽

Sources Of Contamination ◽

Selection Of

ABSTRACT We present criteria for the selection of M-dwarfs down to G < 14.5 using all-sky survey data, with a view to identifying potential M-dwarfs, to be confirmed spectroscopically by the FunnelWeb survey. Two sets of criteria were developed. The first, based on absolute magnitude in the Gaia G passband, with MG > 7.7, selects 76,392 stars, with 81.0 per cent expected to be M-dwarfs at a completeness of >97 per cent. The second is based on colour and uses Gaia, WISE, and 2MASS all-sky photometry. This criteria identifies 94,479 candidate M-dwarfs, of which between 29.4 per cent and 47.3 per cent are expected to be true M-dwarfs, and which contains 99.6 per cent of expected M-dwarfs. Both criteria were developed using synthetic galaxy model predictions, and a previously spectroscopically classified set of M- and K-dwarfs, to evaluate both M-dwarf completeness and false-positive detections (i.e. the non-M-dwarf contamination rate). Both criteria used in combination demonstrate how each excludes different sources of contamination. We therefore developed a final set of criteria that combines absolute magnitude and colour selection to identify 74,091 stars. All these sets of criteria select numbers of objects feasible for confirmation via massively multiplexed spectroscopic surveys like FunnelWeb.

Download Full-text

Feature Based Comparison and Selection of SDN Controller

International Journal of Innovation and Technology Management ◽

10.1142/s0219877019500299 ◽

2019 ◽

Vol 16 (05) ◽

pp. 1950029

Author(s):

Mohammed Abdul Rahman AlShehri ◽

Shailendra Mishra

Keyword(s):

Network Performance ◽

Campus Network ◽

Analytic Hierarchy ◽

Decision Logic ◽

Feature Based ◽

Efficient Processing ◽

Hierarchy Process ◽

Entire Network ◽

Sdn Controller ◽

Selection Of

Software defined network (SDN) controller selection in SDN is a key challenge to the network administrator. In SDN, control plane is an isolated process and operate on control layer. The controller provides a universal view of the entire network and support applications and services. The three focused parameters for controller selection are productivity, campus network and open source. In SDN, it is vital to have a good device for the efficient processing of all requests made by the switch and for good behavior of the network. For selecting best controller for the specified parameters, decision logic has to be developed that allow us to do comparison of the available controllers. Therefore, in this research we have suggested a methodology that uses analytic-hierarchy-process (AHP) to find a best controller. The approach has been studied and verified for a big organization network setup of Al-Majmaah University, Saudi Arabia. The approach is found to be more effective and increase the network performance significantly.

Download Full-text

A Feature-Based Learning Framework for Accurate Prostate Localization in CT Images

IEEE Transactions on Image Processing ◽

10.1109/tip.2012.2194296 ◽

2012 ◽

Vol 21 (8) ◽

pp. 3546-3559 ◽

Cited By ~ 22

Author(s):

Shu Liao ◽

Dinggang Shen

Keyword(s):

Ct Images ◽

Learning Framework ◽

Feature Based

Download Full-text

A Robust Rerank Approach for Feature Selection and Its Application to Pooling-Based GWA Studies

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/860673 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Jia-Rou Liu ◽

Po-Hsiu Kuo ◽

Hung Hung

Keyword(s):

Characteristic Curve ◽

Real Data ◽

Tuning Parameter ◽

Practical Implementation ◽

Tuning Parameters ◽

Genome Wide ◽

Feature Based ◽

The Difference ◽

Gwa Studies ◽

Selection Of

Large-p-small-ndatasets are commonly encountered in modern biomedical studies. To detect the difference between two groups, conventional methods would fail to apply due to the instability in estimating variances int-test and a high proportion of tied values in AUC (area under the receiver operating characteristic curve) estimates. The significance analysis of microarrays (SAM) may also not be satisfactory, since its performance is sensitive to the tuning parameter, and its selection is not straightforward. In this work, we propose a robust rerank approach to overcome the above-mentioned diffculties. In particular, we obtain a rank-based statistic for each feature based on the concept of “rank-over-variable.” Techniques of “random subset” and “rerank” are then iteratively applied to rank features, and the leading features will be selected for further studies. The proposed re-rank approach is especially applicable for large-p-small-ndatasets. Moreover, it is insensitive to the selection of tuning parameters, which is an appealing property for practical implementation. Simulation studies and real data analysis of pooling-based genome wide association (GWA) studies demonstrate the usefulness of our method.

Download Full-text

TERRESTRIAL LASER SCANNERS SELF-CALIBRATION STUDY: DATUM CONSTRAINTS ANALYSES FOR NETWORK CONFIGURATIONS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-2-w4-1-2015 ◽

2015 ◽

Vol XL-2/W4 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

M. A. Abbas ◽

H. Setan ◽

Z. Majid ◽

A. K. Chong ◽

L. Chong Luh ◽

...

Keyword(s):

Laser Scanner ◽

Network Configuration ◽

Calibration Technique ◽

Self Calibration ◽

Laser Scanners ◽

Different Types ◽

Minimum Number ◽

Calibration Study ◽

Different Sources ◽

Selection Of

Similar to other electronic instruments, terrestrial laser scanner (TLS) can also inherent with various systematic errors coming from different sources. Self-calibration technique is a method available to investigate these errors for TLS which were adopted from photogrammetry technique. According to the photogrammetry principle, the selection of datum constraints can cause different types of parameter correlations. However, the network configuration applied by TLS and photogrammetry calibrations are quite different, thus, this study has investigated the significant of photogrammetry datum constraints principle in TLS self-calibration. To ensure that the assessment is thorough, the datum constraints analyses were carried out using three variant network configurations: 1) minimum number of scan stations; 2) minimum number of surfaces for targets distribution; and 3) minimum number of point targets. Based on graphical and statistical, the analyses of datum constraints selection indicated that the parameter correlations obtained are significantly similar. In addition, the analysis has demonstrated that network configuration is a very crucial factor to reduce the correlation between the calculated parameters.

Download Full-text

Syntax Role for Neural Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli_a_00408 ◽

2021 ◽

pp. 1-48

Author(s):

Zuchao Li ◽

Hai Zhao ◽

Shexia He ◽

Jiaxun Cai

Keyword(s):

Argument Structure ◽

Large Scale ◽

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Empirical Survey ◽

Learning Framework ◽

Syntactic Information ◽

Feature Based ◽

Predicate Argument Structure

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.

Download Full-text

Algorithm for preprocessing and unification of time series based on machine learning for data structuring

Программные системы и вычислительные методы ◽

10.7256/2454-0714.2020.3.33958 ◽

2020 ◽

pp. 40-50

Author(s):

Andrey Sergeevich Kopyrin ◽

Irina Leonidovna Makarova

Keyword(s):

Time Series ◽

Domain Knowledge ◽

Fuzzy Time Series ◽

Data Set ◽

Structure Information ◽

Combined Use ◽

Primary Documents ◽

Preliminary Preparation ◽

Single Data ◽

Different Sources

The subject of the research is the process of collecting and preliminary preparation of data from heterogeneous sources. Economic information is heterogeneous and semi-structured or unstructured in nature. Due to the heterogeneity of the primary documents, as well as the human factor, the initial statistical data may contain a large amount of noise, as well as records, the automatic processing of which may be very difficult. This makes preprocessing dynamic input data an important precondition for discovering meaningful patterns and domain knowledge, and making the research topic relevant.Data preprocessing is a series of unique tasks that have led to the emergence of various algorithms and heuristic methods for solving preprocessing tasks such as merge and cleanup, identification of variablesIn this work, a preprocessing algorithm is formulated that allows you to bring together into a single database and structure information on time series from different sources. The key modification of the preprocessing method proposed by the authors is the technology of automated data integration.The technology proposed by the authors involves the combined use of methods for constructing a fuzzy time series and machine lexical comparison on the thesaurus network, as well as the use of a universal database built using the MIVAR concept.The preprocessing algorithm forms a single data model with the ability to transform the periodicity and semantics of the data set and integrate data that can come from various sources into a single information bank.

Download Full-text

The Semantic Web in Tourism

Handbook of Research on Social Dimensions of Semantic Technologies and Web Services ◽

10.4018/978-1-60566-650-1.ch033 ◽

2011 ◽

pp. 675-703 ◽

Cited By ~ 2

Author(s):

Salvador Miranda Lima ◽

José Moreira

Keyword(s):

World Wide Web ◽

Semantic Web ◽

Information Integration ◽

World Wide ◽

Global Information ◽

Semantic Model ◽

Current Trends ◽

The World ◽

Different Sources ◽

Selection Of

The emergence of the World Wide Web made available massive amounts of data. This data, created and disseminated from many different sources, is prepared and linked in a way that is well-suited for display purposes, but automation, integration, interoperability or context-oriented search can hardly be implemented. Hence, the Semantic Web aims at promoting global information integration and semantic interoperability, through the use of metadata, ontologies and inference mechanisms. This chapter presents a Semantic Model for Tourism (SeMoT), designed for building Semantic Web enabled applications for the planning and management of touristic itineraries, taking into account the new requirements of more demanding and culturally evolved tourists. It includes an introduction to relevant tourism concepts, an overview of current trends in Web Semantics research and a presentation of the architecture, main features and a selection of representative ontologies that compose the SeMoT.

Download Full-text

Influence of the origin of the inoculum on the anaerobic biodegradability test

Water Science & Technology ◽

10.2166/wst.2004.0017 ◽

2004 ◽

Vol 49 (1) ◽

pp. 53-59 ◽

Cited By ~ 15

Author(s):

I. Moreno-Andrade ◽

G. Buitrón

Keyword(s):

Most Probable Number ◽

Microbial Composition ◽

Probable Number ◽

Methanogenic Activity ◽

Sulfate Reducing ◽

Anaerobic Biodegradability ◽

Specific Methanogenic Activity ◽

Solids Content ◽

Different Sources ◽

Selection Of

Five different sources of inocula were studied to determine its influence on biodegradability tests. Inocula were characterized determining granulometry, specific methanogenic activity, solids content, and volumetric sludge index. Also, the fermentative, aceticlastic, hydrogenophilic, OPHA, and sulfate-reducing groups were determined by the most probable number technique. Anaerobic biodegradability tests were conducted with two different substrates, one easy to degrade (glucose) and a toxic one (phenol). The best performance, in terms of percent of biodegradation and lag time, for both substrates, was obtained with the inoculum from a brewery industry UASB. The results can be explained in terms of the initial activity of the inoculum. The influence of the significant variations found in the specific methanogenic activity of the five inocula studied is discussed, in terms of the microbial composition of the samples. The results emphasized the importance of the selection of an appropriate source of inoculum in order to obtain reliable results.

Download Full-text