test instance
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 14)

H-INDEX

4
(FIVE YEARS 1)

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Yunsheng Song ◽  
Xiaohan Kong ◽  
Chao Zhang

Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k -nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k -nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k -means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Wei Li ◽  
Youmeng Luo ◽  
Chao Tang ◽  
Kaiqiang Zhang ◽  
Xiaoyu Ma

The regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. The thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. Then, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. Theory and experiments show that BFGRT is accurate, efficient, and robust.


Author(s):  
Akba Zoheer Mohammed

Service recovery is still among the most essential approaches to enhance the durability of their contemporary distribution system. Following the error location is identified and isolated; a correct SR program ought to be ascertained to resupply out-of-service places. 2 heuristic approaches are suggested to locate an efficient and speedy solution in contemporary power supply systems. For resolving the support recovery issue in distribution systems, change selection indices sectionalizes switch stated by an analytic plan in addition to a practicable optimized heuristic graph-based process are given. The formulation of the issue includes four different functions like optimizing the complete load restored and cutting back the amount of changing operations. Maximizing the best priority restored loading, also decreasing load decreasing. A nice evaluation of change indices is used for many player tie sticks from the apparatus to think about the ideal solution and minimize the complete quantity of shifting operations. A brand new graph-based program may be used for hunting the best sectionalizes change and diminishing the voltage fall. The precision as well as the validity of this process are analyzed in two regular electrical supply procedures. The outcomes of these suggested methods are utilized for IEEE regular bus test instance.


2021 ◽  
Vol 9 ◽  
pp. 691-706
Author(s):  
Ofer Sabo ◽  
Yanai Elazar ◽  
Yoav Goldberg ◽  
Ido Dagan

We explore few-shot learning (FSL) for relation classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, [NOTA]), we first revisit the recent popular dataset structure for FSL, pointing out its unrealistic data distribution. To remedy this, we propose a novel methodology for deriving more realistic few-shot test data from available datasets for supervised RC, and apply it to the TACRED dataset. This yields a new challenging benchmark for FSL-RC, on which state of the art models show poor performance. Next, we analyze classification schemes within the popular embedding-based nearest-neighbor approach for FSL, with respect to constraints they impose on the embedding space. Triggered by this analysis, we propose a novel classification scheme in which the NOTA category is represented as learned vectors, shown empirically to be an appealing option for FSL.


2021 ◽  
Vol 37 ◽  
pp. 01023
Author(s):  
Preeti Tamrakar ◽  
S. P. Syed Ibrahim

One of the algorithms, which prudently denote better outcomes than the traditional associative classification systems, is the Lazy learning associative classification (LLAC), where the processing of training data is delayed until a test instance is received, whereas in eager learning, before receiving queries, the system begins to process training data. Traditional method assumes that all items within a transaction is same, which is not always true. This paper recommends a new framework called lazy learning associative classification with WkNN (LLAC_WkNN) which uses weighted kNN method with LLAC, that gives a subset of rules when LLAC is applied to the dataset. In order to predict the class label of the unseen test case, the weighted kNN (WkNN) algorithm is then applied to this generated subset. This creates the enhanced accuracy of the classifier. The WkNN also gives an outlier more weight. By applying Dual Distance Weight to LLAC named as LLAC_DWkNN, this limitation of WkNN is resolved. LLAC_DWkNN gives less weightage to outliers, which improve the accuracy of the classifier, further. This algorithm has been applied to different datasets and the experiment results demonstrate that the proposed method is efficient as compared to the traditional and other existing systems.


Author(s):  
Gloria Lola Quispe ◽  
Maria Fernanda Rodríguez ◽  
José Daniel Ontiveros

Metaheuristics are non-deterministic algorithms. Metaheuristic strategies are related to design. This chapter presents an introduction on metaheuristics, from the point of view of its theoretical study and the foundations for its use. Likewise, a description and comparative study of the ant colony-based algorithms is carried out. These are ant system (AS), ant colony system (ACS), and max-min ant system (MMAS). These results serve to deliver solutions to complex problems and generally with a high degree of combinatorics for those there is no way to find the best reasonable time. An experimentation and analysis of the results of the ACO algorithms (optimization by ants colonies) is also carried out. For the evaluation of the algorithms, comparisons are made for instances of the TSPLIB test instance library. Therefore, it is deepened in the resolution of the travelling salesman problem (TSP), and a comparative analysis of the different algorithms is carried out in order to see which one adjusts better.


Author(s):  
V.N. Manjunath Aradhya ◽  
Mufti Mahmud ◽  
Basant Agarwal ◽  
D.S. Guru ◽  
M. Shamim Kaiser

Corona virus disease (COVID-19) has infected over more than 10 million people around the globe and killed at least 500K worldwide by the end of June 2020. As this disease continues to evolve and scientists and researchers around the world now trying to find out the way to combat this disease in most effective way. Chest X-rays are widely available modality for immediate care in diagnosing COVID-19. Precise detection and diagnosis of COVID-19 from these chest X-rays would be practical for the current situation. This paper proposes one shot cluster based approach for the accurate detection of COVID-19 chest x-rays. The main objective of one shot learning (OSL) is to mimic the way humans learn in order to make classification or prediction on a wide range of similar but novel problems. The core constraint of this type of task is that the algorithm should decide on the class of a test instance after seeing just one test example. For this purpose we have experimented with widely known Generalized Regression and Probabilistic Neural Networks. Experiments conducted with publicly available chest x-ray images demonstrate that the method can detect COVID-19 accurately with high precision. The obtained results have outperformed many of the convolutional neural network based existing methods proposed in the literature.


2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Ming Wei ◽  
Tao Liu ◽  
Bo Sun ◽  
Binbin Jing

This study proposed a mathematical model for designing a feeder transit service for improving the service quality and accessibility of transportation hubs (such as airport and rail station). The proposed model featured an integrated framework, which simultaneously guided passengers to reach their nearest stops to get on and off the bus, designed routes to transport passengers from these selected pick-up stops to the transportation hubs, and calculated their departure frequencies. In particular, the maximum walking distance, the upper and lower limits of route frequencies, and the load factor rate of each route were fully accounted for in this study. The main objective of the proposed model was to simultaneously minimize the total walking, riding time, and waiting time of all passengers. As this study explored an NP-hard problem, a two-stage genetic algorithm combining the Dijkstra search method was further developed to yield metaoptimal solutions to the model within an acceptable time. Finally, a test instance in Chongqing City, China, demonstrated that the proposed model was an effective tool to generate a pedestrian, route, and operation plan; it reduced the total travel time, compared with the traditional model.


Author(s):  
Simon Fong

Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that is applicable to compare two probability density functions. Data comparison is widely used field in our society nowadays, and it is a very import part. To compare two objects is a common task that people from all walks of life would do. People always want or need to find the similarity between two different objects or the difference between two similar objects. Some different data may share some similarity in some given attribute(s). To compare with two datasets based on attributes by classification algorithms, for the attributes, we need to select them out by rules and the system is known as rule-based reasoning system or expert system which classifies a given test instance into a particular outcome from the learned rules. The test instance carries multiple attributes, which are usually the values of diagnostic tests. In this article, we are proposing a classifier ensemble-based method for comparison of two datasets or one dataset with different features. The ensemble data mining learning methods are applied for rule generation, and a multi-criterion evaluation approach is used for selecting reliable rules over the results of the ensemble methods. The efficacy of the proposed methodology is illustrated via an example of two disease datasets; it is a combined dataset with the same instances and normal attributes but the class in strictly speaking. This article introduces a fuzzy rule-based classification method called FURIA, to get the relationship between two datasets by FURIA rules. And find the similarity between these two datasets.


Sign in / Sign up

Export Citation Format

Share Document