Research on the Confidence Regression Based on KNN Algorithm

2015 ◽  
Vol 713-715 ◽  
pp. 1877-1881
Author(s):  
Fang Chun Jiang ◽  
Sheng Feng Tian

Confidence regression is a significant research field of confidence machine learning. This paper adopts KNN algorithm as a tool, and performs error evaluation on results of regressive learning to classify the accept field and the refuse field so as to achieve the confidence regression. By setting specific error value, this approach achieves controllable confidence regression, which has been tested on experimental data of bodyfat and other data sets. The experimental results presented show the feasibility of our approach.

2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2021 ◽  
Vol 16 ◽  
Author(s):  
Yuqing Qian ◽  
Hao Meng ◽  
Weizhong Lu ◽  
Zhijun Liao ◽  
Yijie Ding ◽  
...  

Background: The identification of DNA binding proteins (DBP) is an important research field. Experiment-based methods are time-consuming and labor-intensive for detecting DBP. Objective: To solve the problem of large-scale DBP identification, some machine learning methods are proposed. However, these methods have insufficient predictive accuracy. Our aim is to develop a sequence-based machine learning model to predict DBP. Methods: In our study, we extract six types of features (including NMBAC, GE, MCD, PSSM-AB, PSSM-DWT, and PsePSSM) from protein sequences. We use Multiple Kernel Learning based on Hilbert-Schmidt Independence Criterion (MKL-HSIC) to estimate the optimal kernel. Then, we construct a hypergraph model to describe the relationship between labeled and unlabeled samples. Finally, Laplacian Support Vector Machines (LapSVM) is employed to train the predictive model. Our method is tested on PDB186, PDB1075, PDB2272 and PDB14189 data sets. Result: Compared with other methods, our model achieves best results on benchmark data sets. Conclusion: The accuracy of 87.1% and 74.2% are achieved on PDB186 (Independent test of PDB1075) and PDB2272 (Independent test of PDB14189), respectively.


2020 ◽  
Vol 8 (5) ◽  
pp. 3353-3360

Android is the most popular Operating Systems with over 2.5 billion devices across the globe. The popularity of this OS has unfortunately made the devices and the services they enable, vulnerable to numerous security threats. As a result of this, a significant research is being done in the field of Android Malware Detection employing Machine Learning Algorithms. Our current work emphasizes on the possible use of Machine Learning techniques for the detection of malware on such android devices. The proposed EKMPRFG is applied for the classification of Android Malware after a preprocessing phase involving a hybrid Feature Selection model using proposed Standard Deviation of Standard Deviation of Ranks (SDSDR) and several other builtin Feature Selection algorithms such as Correlation based Feature Selection (CFS), Classifier SubsetEval, Consistency SubsetEval, and Filtered SubsetEval followed by Principal Component Analysis(PCA) for dimensionality reduction. The experimental results obtained on two data sets indicate that EKMPRFG outperforms the existing works in terms of Prediction Accuracy and Weighted F- Measure values.


2014 ◽  
Author(s):  
Gordon Akudibillah ◽  
Sonja E.M. Boas ◽  
Benoit M. Carreres ◽  
Marchien Dallinga ◽  
Aalt-Jan van Dijk ◽  
...  

This preprint is the outcome of the “Training Workshop Interdisciplinary Life Sciences”, held in October 2013 in the Lorentz Center, Leiden, The Netherlands. The motivation to organize this event stems from the following considerations: The enormous progress in laboratory techniques and facilities leads to the availability of huge amounts of data at all levels of complexity (molecules, cells, tissues, organs, organisms, populations, ecosystems). Especially data at the cellular level reveal details of life processes we were unconscious of until recently. However, it becomes clear that huge amounts of data alone do not automatically lead to understanding. The data explosion in Life Sciences teaches one lesson: life processes are of a highly intricate and integrative nature. To really understand the dynamic processes in living organisms one must integrate experimental data sets in quantitative and predictive models. Only then one may hope to grasp the functioning of these complex systems and be able to convert information in understanding. In the field of physics, for instance, this strong interaction between experiment and theory is already common practice since centuries, culminating in the 20th century being called the ’Century of Physics’. In contrast to physics, the complex nature of the Life Sciences forces us to work in an interdisciplinary fashion. The necessary expertise is available, but scattered over many scientific disciplines. Only the combined efforts of biologists, chemists, mathematicians, physicists, engineers, and informaticians will lead to progress in tackling the huge challenge of understanding the complexity of life. Researchers in the Life Sciences often focus their research on a rather narrow research field. However, the majority of the upcoming generation of researchers in the Life Sciences should be trained to expand their skills, becoming able to tackle complex, multi-dimensional systems. The knowledge they have to incorporate in their research will stem from a diverse range of disciplines, So, they should be trained to integrate a broad range of modelling approaches in order to deduce quantitative, predictive and often multi-scale models from highly diverse data sets. Present curricula in the Life Sciences hardly offer this kind of training yet. This workshop intends to start filling this gap. Three teams worked on the following open problems: 1) Modeling the influence of temperature on the Regulation of flowering time in Arabidopsis thaliana; 2) Validation of computational models of angiogenesis to experimental data; 3) Reconstructing the gene network that regulates branching in Tomato. This preprint bundles the reports of the three teams.


2021 ◽  
Vol 17 (1) ◽  
pp. 45-53
Author(s):  
Le Hong Trang ◽  
Tran Duong Huy ◽  
Anh Ngoc Le

Purpose Pricing on the online booking systems is a difficult task for the host, the systems usually set the prices that are lower than the general premises and quality, and that only gives benefits to the system by easily attracting the customer to use the service. The setting price of the new accommodation is often based on location, the number of beds, type of house and so on. The main problem is to predict the most reasonable price for the host. This paper aims to study the use of machine learning and sentiment analysis for predicting the price of online booking systems. Design/methodology/approach In particular, an empirical study is performed first for some well-known classification models for the problems. The authors then propose to apply k-means, a clustering technique, together with Gradient Boost and XGBoost models to improve the prediction performance. Experiments are conducted and tested for real Airbnb data sets collected in London City. Findings Experimental results are given and compared to show that the authors’ method outperforms to an updated method. Originality/value The authors use k-means and sampling together with Gradient Boost and XGBoost models to improve the prediction performance.


2012 ◽  
Vol 524-527 ◽  
pp. 2026-2030
Author(s):  
Marek Šolc ◽  
Štefan Markulik ◽  
Eva Grambalová

In addressing issues related to technology or quality refractory products are among the supporting documents experimental results of the tests. These more or less extensive data sets characterize with some precision observed phenomenon, e.g. some physical or chemical quantity. The role of statistical processing of data from this perspective, the maximum concentration sometimes extremely abundant, but few clear set of experimental data and determine the "seriousness" of this file. When processing data it is to be noted that these characteristics are not fully observed variable, but only a selected part.


2021 ◽  
pp. 016555152098550
Author(s):  
Alaettin Uçan ◽  
Murat Dörterler ◽  
Ebru Akçapınar Sezer

Emotion classification is a research field that aims to detect the emotions in a text using machine learning methods. In traditional machine learning (TML) methods, feature engineering processes cause the loss of some meaningful information, and classification performance is negatively affected. In addition, the success of modelling using deep learning (DL) approaches depends on the sample size. More samples are needed for Turkish due to the unique characteristics of the language. However, emotion classification data sets in Turkish are quite limited. In this study, the pretrained language model approach was used to create a stronger emotion classification model for Turkish. Well-known pretrained language models were fine-tuned for this purpose. The performances of these fine-tuned models for Turkish emotion classification were comprehensively compared with the performances of TML and DL methods in experimental studies. The proposed approach provides state-of-the-art performance for Turkish emotion classification.


2014 ◽  
Author(s):  
Gordon Akudibillah ◽  
Sonja E.M. Boas ◽  
Benoit M. Carreres ◽  
Marchien Dallinga ◽  
Aalt-Jan van Dijk ◽  
...  

This preprint is the outcome of the “Training Workshop Interdisciplinary Life Sciences”, held in October 2013 in the Lorentz Center, Leiden, The Netherlands. The motivation to organize this event stems from the following considerations: The enormous progress in laboratory techniques and facilities leads to the availability of huge amounts of data at all levels of complexity (molecules, cells, tissues, organs, organisms, populations, ecosystems). Especially data at the cellular level reveal details of life processes we were unconscious of until recently. However, it becomes clear that huge amounts of data alone do not automatically lead to understanding. The data explosion in Life Sciences teaches one lesson: life processes are of a highly intricate and integrative nature. To really understand the dynamic processes in living organisms one must integrate experimental data sets in quantitative and predictive models. Only then one may hope to grasp the functioning of these complex systems and be able to convert information in understanding. In the field of physics, for instance, this strong interaction between experiment and theory is already common practice since centuries, culminating in the 20th century being called the ’Century of Physics’. In contrast to physics, the complex nature of the Life Sciences forces us to work in an interdisciplinary fashion. The necessary expertise is available, but scattered over many scientific disciplines. Only the combined efforts of biologists, chemists, mathematicians, physicists, engineers, and informaticians will lead to progress in tackling the huge challenge of understanding the complexity of life. Researchers in the Life Sciences often focus their research on a rather narrow research field. However, the majority of the upcoming generation of researchers in the Life Sciences should be trained to expand their skills, becoming able to tackle complex, multi-dimensional systems. The knowledge they have to incorporate in their research will stem from a diverse range of disciplines, So, they should be trained to integrate a broad range of modelling approaches in order to deduce quantitative, predictive and often multi-scale models from highly diverse data sets. Present curricula in the Life Sciences hardly offer this kind of training yet. This workshop intends to start filling this gap. Three teams worked on the following open problems: 1) Modeling the influence of temperature on the Regulation of flowering time in Arabidopsis thaliana; 2) Validation of computational models of angiogenesis to experimental data; 3) Reconstructing the gene network that regulates branching in Tomato. This preprint bundles the reports of the three teams.


2014 ◽  
Vol 1079-1080 ◽  
pp. 851-855
Author(s):  
Fang Chun Jiang ◽  
Sheng Feng Tian

Manageable confidence machine learning is one of the important approaches to implement confidence machine application. This paper is based on two class confidence classifier, adopting two class classifier as tool to convert learning results of classifiers and achieve confidence management through setting threshold values. The research accomplished manageable general accuracy of the classification and manageable positive/negative classification accuracy. Such method is tested in 5 experimental data sets of cardiopathy and diabetes, achieved preferable research result.


2021 ◽  
Author(s):  
Ehsan Kharazmi ◽  
Zhicheng Wang ◽  
Dixia Fan ◽  
Samuel Rudy ◽  
Themis Sapsis ◽  
...  

Abstract Assessing the fatigue damage in marine risers due to vortex-induced vibrations (VIV) serves as a comprehensive example of using machine learning methods to derive assessment models of complex systems. A complete characterization of response of such complex systems is usually unavailable despite massive experimental data and computation results. These algorithms can use multi-fidelity data sets from multiple sources, including real-time sensor data from the field, systematic experimental data, and simulation data. Here we develop a three-pronged approach to demonstrate how tools in machine learning are employed to develop data-driven models that can be used for accurate and efficient fatigue damage predictions for marine risers subject to VIV.


Sign in / Sign up

Export Citation Format

Share Document