scholarly journals Bootstrapping Distributional Feature Vector Quality

2009 ◽  
Vol 35 (3) ◽  
pp. 435-461 ◽  
Author(s):  
Maayan Zhitomirsky-Geffet ◽  
Ido Dagan

This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distributional similarity methods is insufficient quality of the word feature vectors, caused by deficient feature weighting. This observation led to the definition of a bootstrapping scheme which yields improved feature weights, and hence higher quality feature vectors. The underlying idea of our approach is that features which are common to similar words are also most characteristic for their meanings, and thus should be promoted. This idea is realized via a bootstrapping step applied to an initial standard approximation of the similarity space. The superior performance of the bootstrapping method was assessed in two different experiments, one based on direct human gold-standard annotation and the other based on an automatically created disambiguation dataset. These results are further supported by applying a novel quantitative measurement of the quality of feature weighting functions. Improved feature weighting also allows massive feature reduction, which indicates that the most characteristic features for a word are indeed concentrated at the top ranks of its vector. Finally, experiments with three prominent similarity measures and two feature weighting functions showed that the bootstrapping scheme is robust and is independent of the original functions over which it is applied.

2020 ◽  
Vol 10 (5) ◽  
pp. 1793
Author(s):  
Lina Du ◽  
Li Zhuo ◽  
Jiafeng Li ◽  
Jing Zhang ◽  
Xiaoguang Li ◽  
...  

DASH (Dynamic Adaptive Streaming over HTTP (HyperText Transfer Protocol)) as a universal unified multimedia streaming standard selects the appropriate video bitrate to improve the user’s Quality of Experience (QoE) according to network conditions, client status, etc. Considering that the quantitative expression of the user’s QoE is also a difficult point in itself, this paper researched the distortion caused due to video compression, network transmission and other aspects, and then proposes a video QoE metric for dynamic adaptive streaming services. Three-Dimensional Convolutional Neural Networks (3D CNN) and Long Short-Term Memory (LSTM) are used together to extract the deep spatial-temporal features to represent the content characteristics of the video. While accounting for the fluctuation in the quality of a video caused by bitrate switching on the QoE, other factors such as video content characteristics, video quality and video fluency, are combined to form the input feature vector. The ridge regression method is adopted to establish a QoE metric that enables to dynamically describe the relationship between the input feature vector and the value of the Mean Opinion Score (MOS). The experimental results on different datasets demonstrate that the prediction accuracy of the proposed method can achieve superior performance over the state-of-the-art methods, which proves the proposed QoE model can effectively guide the client’s bitrate selection in dynamic adaptive streaming media services.


Author(s):  
Jagruti Ketan Save

Thousands of images are generated everyday, which implies the need to build an easy, faster, automated classifier to classify and organize these images. Classification means selecting an appropriate class for a given image from a set of pre-defined classes. The main objective of this work is to explore feature vector generation using Walsh transform for classification. In the first method, we applied Walsh transform on the columns of an image to generate feature vectors. In second method, Walsh wavelet matrix is used for feature vector generation. In third method we proposed to apply vector quantization (VQ) on feature vectors generated by earlier methods. It gives better accuracy, fast computation and less storage space as compared with the earlier methods. Nearest neighbor and nearest mean classification algorithms are used to classify input test image. Image database used for the experimentation contains 2000 images. All these methods generate large number of outputs for single test image by considering four similarity measures, six sizes of feature vector, two ways of classification, four VQ techniques, three sizes of codebook, and five combinations of wavelet transform matrix generation. We observed improvement in accuracy from 63.22% to 74% (55% training data) through the series of techniques.


1999 ◽  
Vol 5 (2) ◽  
pp. 157-170
Author(s):  
JEONG-MI CHO ◽  
JUNGYUN SEO ◽  
GIL CHANG KIM

This paper presents a system for automatic verb sense disambiguation using a small corpus and a Machine-Readable Dictionary (MRD) in Korean. The system learns a set of typical uses listed in the MRD usage examples for each of the senses of a polysemous verb in the MRD definitions using verb-object co-occurrences acquired from the corpus. This paper concentrates on the problem of data sparseness in two ways. First, by extending word similarity measures from direct co-occurrences to co-occurrences of co-occurring words, we compute the word similarities using non co-occurring words but co-occurring clusters. Secondly, we acquire IS-A relations of nouns from the MRD definitions. It is possible to roughly cluster the nouns by the identification of the IS-A relationship. Using these methods, two words may be considered similar even if they do not share any word elements. Experiments show that this method can learn from a very small training corpus, achieving over an 86% correct disambiguation performance without any restriction on a word's senses.


Author(s):  
Jagruti Ketan Save

Thousands of images are generated everyday, which implies the need to build an easy, faster, automated classifier to classify and organize these images. Classification means selecting an appropriate class for a given image from a set of pre-defined classes. The main objective of this work is to explore feature vector generation using Walsh transform for classification. In the first method, we applied Walsh transform on the columns of an image to generate feature vectors. In second method, Walsh wavelet matrix is used for feature vector generation. In third method we proposed to apply vector quantization (VQ) on feature vectors generated by earlier methods. It gives better accuracy, fast computation and less storage space as compared with the earlier methods. Nearest neighbor and nearest mean classification algorithms are used to classify input test image. Image database used for the experimentation contains 2000 images. All these methods generate large number of outputs for single test image by considering four similarity measures, six sizes of feature vector, two ways of classification, four VQ techniques, three sizes of codebook, and five combinations of wavelet transform matrix generation. We observed improvement in accuracy from 63.22% to 74% (55% training data) through the series of techniques.


2021 ◽  
Vol 1 ◽  
pp. 11-20
Author(s):  
Owen Freeman Gebler ◽  
Mark Goudswaard ◽  
Ben Hicks ◽  
David Jones ◽  
Aydin Nassehi ◽  
...  

AbstractPhysical prototyping during early stage design typically represents an iterative process. Commonly, a single prototype will be used throughout the process, with its form being modified as the design evolves. If the form of the prototype is not captured as each iteration occurs understanding how specific design changes impact upon the satisfaction of requirements is challenging, particularly retrospectively.In this paper two different systems for digitising physical artefacts, structured light scanning (SLS) and photogrammetry (PG), are investigated as means for capturing iterations of physical prototypes. First, a series of test artefacts are presented and procedures for operating each system are developed. Next, artefacts are digitised using both SLS and PG and resulting models are compared against a master model of each artefact. Results indicate that both systems are able to reconstruct the majority of each artefact's geometry within 0.1mm of the master, however, overall SLS demonstrated superior performance, both in terms of completion time and model quality. Additionally, the quality of PG models was far more influenced by the effort and expertise of the user compared to SLS.


2016 ◽  
Vol 16 (6) ◽  
pp. 27-42 ◽  
Author(s):  
Minghan Yang ◽  
Xuedong Gao ◽  
Ling Li

Abstract Although Clustering Algorithm Based on Sparse Feature Vector (CABOSFV) and its related algorithms are efficient for high dimensional sparse data clustering, there exist several imperfections. Such imperfections as subjective parameter designation and order sensibility of clustering process would eventually aggravate the time complexity and quality of the algorithm. This paper proposes a parameter adjustment method of Bidirectional CABOSFV for optimization purpose. By optimizing Parameter Vector (PV) and Parameter Selection Vector (PSV) with the objective function of clustering validity, an improved Bidirectional CABOSFV algorithm using simulated annealing is proposed, which circumvents the requirement of initial parameter determination. The experiments on UCI data sets show that the proposed algorithm, which can perform multi-adjustment clustering, has a higher accurateness than single adjustment clustering, along with a decreased time complexity through iterations.


1970 ◽  
pp. 33-36
Author(s):  
A. ANBURANI

The present investigation was carried out to study the effect of off season soil management practices on yield and quality of turmeric (Curcuma longa L.) cultivars. The experiment was laid out in a Factorial Randomized Block Design with ten treatments in three replications consisted of five off-season land management treatments viz., fallow (S1), summer ploughing 2 times (S2), summer ploughing 1 time (S3), solarization with transparent polyethylene film of 0.05 mm thick for 40 days (S4) and black polyethylene film for 40 days (S5). It was tested with two popular cultivars viz., Curcuma longa -1 CL-1 (V1) and Curcuma longa-2 CL-2 (V2), collected from Erode and Chidambaram. Various yield components were recorded at the time of harvest and were analysed. The yield attributing characters viz., number, length, girth and weight of mother, primary and secondary rhizomes were recorded. The treatment where solarization with transparent polyethylene film of 0.05 mm thick was tested recorded the highest yield and yield attributing characters when compared to other treatments. The same treatment also exhibited the highest fresh rhizome yield per plant, curing percentage and cured rhizome yield. The quality parameters like curcumin, oleoresin and essential oil content were also showed superior performance in the treatment where solarization with transparent polyethylene film of 0.05 mm thick was applied.


2018 ◽  
Vol 29 (01) ◽  
pp. 1850003 ◽  
Author(s):  
Chuang Liu ◽  
Linan Fan ◽  
Zhou Liu ◽  
Xiang Dai ◽  
Jiamei Xu ◽  
...  

Community detection in complex networks is a key problem of network analysis. In this paper, a new membrane algorithm is proposed to solve the community detection in complex networks. The proposed algorithm is based on membrane systems, which consists of objects, reaction rules, and a membrane structure. Each object represents a candidate partition of a complex network, and the quality of objects is evaluated according to network modularity. The reaction rules include evolutionary rules and communication rules. Evolutionary rules are responsible for improving the quality of objects, which employ the differential evolutionary algorithm to evolve objects. Communication rules implement the information exchanged among membranes. Finally, the proposed algorithm is evaluated on synthetic, real-world networks with real partitions known and the large-scaled networks with real partitions unknown. The experimental results indicate the superior performance of the proposed algorithm in comparison with other experimental algorithms.


2021 ◽  
Author(s):  
Xuan Thao Nguyen ◽  
Shuo Yan Chou

Abstract Intuitionistic fuzzy sets (IFSs), including member and nonmember functions, have many applications in managing uncertain information. The similarity measures of IFSs proposed to represent the similarity between different types of sensitive fuzzy information. However, some existing similarity measures do not meet the axioms of similarity. Moreover, in some cases, they could not be applied appropriately. In this study, we proposed some novel similarity measures of IFSs constructed by combining the exponential function of membership functions and the negative function of non-membership functions. In this paper, we also proposed a new entropy measure as a stepping stone to calculate the weights of the criteria in the proposed multi-criteria decision making (MCDM) model. The similarity measures used to rank alternatives in the model. Finally, we used this MCDM model to evaluate the quality of software projects.


Sign in / Sign up

Export Citation Format

Share Document