scholarly journals Challenge Faced in K-Means Clustering for the Choice of Automatic K-Value for Segmentation of Black Sigatoka Disease in Banana Leaves

Clustering is defined as grouping similar items . The three types of machine learning techniques are supervised, unsupervised and semi-supervised. In unsupervised technique, there are no class labels given to the input data. Clustering is a type of unsupervised learning technique. Recently clustering is applied in many fields such as medicine, agriculture, biology, computers, finance and robotics. Black sigatoka is a bacterial disease occurring commonly in banana plants .The research currently focuses on segmenting the disease area from non-diseased area.The segmentation class training is done via Trainable Weka Segmentation and we also do segmentation using k-means algorithm. In this paper we propose a novel approach for extraction of the black sigatoka diseased area on banana leaves from images using pixel color values and grouping them into their respective clusters accordingly. This is a segmentation cum clustering algorithm. The novel approach has been proposed to overcome the shortfall of k-means clustering when segmenting using automatic value selection for k-means by using silhouette values.Using this novel approach its easy to cluster and segment at the same time. The segmented image from this algorithm can be used in disease classification tasks.

Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Shaikh Farhad Hossain ◽  
Ming Huang ◽  
Naoaki Ono ◽  
Aki Morita ◽  
Shigehiko Kanaya ◽  
...  

Abstract A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL:  http://www.knapsackfamily.com/Biomarker/top.php


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Adèle Weber Zendrera ◽  
Nataliya Sokolovska ◽  
Hédi A. Soula

AbstractIn this manuscript, we propose a novel approach to assess relationships between environment and metabolic networks. We used a comprehensive dataset of more than 5000 prokaryotic species from which we derived the metabolic networks. We compute the scope from the reconstructed graphs, which is the set of all metabolites and reactions that can potentially be synthesized when provided with external metabolites. We show using machine learning techniques that the scope is an excellent predictor of taxonomic and environmental variables, namely growth temperature, oxygen tolerance, and habitat. In the literature, metabolites and pathways are rarely used to discriminate species. We make use of the scope underlying structure—metabolites and pathways—to construct the predictive models, giving additional information on the important metabolic pathways needed to discriminate the species, which is often absent in other metabolic network properties. For example, in the particular case of growth temperature, glutathione biosynthesis pathways are specific to species growing in cold environments, whereas tungsten metabolism is specific to species in warm environments, as was hinted in current literature. From a machine learning perspective, the scope is able to reduce the dimension of our data, and can thus be considered as an interpretable graph embedding.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Oz Amram ◽  
Cristina Mantilla Suarez

Abstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.


2021 ◽  
Vol 5 (1) ◽  
pp. 38
Author(s):  
Chiara Giola ◽  
Piero Danti ◽  
Sandro Magnani

In the age of AI, companies strive to extract benefits from data. In the first steps of data analysis, an arduous dilemma scientists have to cope with is the definition of the ’right’ quantity of data needed for a certain task. In particular, when dealing with energy management, one of the most thriving application of AI is the consumption’s optimization of energy plant generators. When designing a strategy to improve the generators’ schedule, a piece of essential information is the future energy load requested by the plant. This topic, in the literature it is referred to as load forecasting, has lately gained great popularity; in this paper authors underline the problem of estimating the correct size of data to train prediction algorithms and propose a suitable methodology. The main characters of this methodology are the Learning Curves, a powerful tool to track algorithms performance whilst data training-set size varies. At first, a brief review of the state of the art and a shallow analysis of eligible machine learning techniques are offered. Furthermore, the hypothesis and constraints of the work are explained, presenting the dataset and the goal of the analysis. Finally, the methodology is elucidated and the results are discussed.


2021 ◽  
Author(s):  
J. Annrose ◽  
N. Herald Anantha Rufus ◽  
C. R. Edwin Selva Rex ◽  
D. Godwin Immanuel

Abstract Bean which is botanically called Phaseolus vulgaris L belongs to the Fabaceae family.During bean disease identification, unnecessary economical losses occur due to the delay of the treatment period, incorrect treatment, and lack of knowledge. The existing deep learning and machine learning techniques met few issues such as high computational complexity, higher cost associated with the training data, more execution time, noise, feature dimensionality, lower accuracy, low speed, etc. To tackle these problems, we have proposed a hybrid deep learning model with an Archimedes optimization algorithm (HDL-AOA) for bean disease classification. In this work, there are five bean classes of which one is a healthy class whereas the remaining four classes indicate different diseases such as Bean halo blight, Pythium diseases, Rhizoctonia root rot, and Anthracnose abnormalities acquired from the Soybean (Large) Data Set.The hybrid deep learning technique is the combination of wavelet packet decomposition (WPD) and long short term memory (LSTM). Initially, the WPD decomposes the input images into four sub-series. For these sub-series, four LSTM networks were developed. During bean disease classification, an Archimedes optimization algorithm (AOA) enhances the classification accuracy for multiple single LSTM networks. MATLAB software implements the HDL-AOA model for bean disease classification. The proposed model accomplishes lower MAPE than other exiting methods. Finally, the proposed HDL-AOA model outperforms excellent classification results using different evaluation measures such as accuracy, specificity, sensitivity, precision, recall, and F-score.


Author(s):  
Shigang Wang ◽  
Shuai Peng ◽  
Jiawen He

Due to the point cloud of oral scan denture has a large amount of data and redundant points. A point cloud simplification algorithm based on feature preserving is proposed to solve the problem that the feature preserving is incomplete when processing point cloud data and cavities occur in relatively flat regions. Firstly, the algorithm uses kd-tree to construct the point cloud spatial topological to search the k-Neighborhood of the sampling point. On the basis of that to calculate the curvature of each point, the angle between the normal vector, the distance from the point to the neighborhood centroid, as well as the standard deviation and the average distance from the point to the neighborhood on this basis, therefore, the detailed features of point cloud can be extracted by multi-feature extraction and threshold determination. For the non-characteristic region, the non-characteristic point cloud is spatially divided through Octree to obtain the K-value of K-means clustering algorithm and the initial clustering center point. The simplified results of non-characteristic regions are obtained after further subdivision. Finally, the extracted detail features and the reduced result of non-featured region will be merged to obtain the final simplification result. The experimental results show that the algorithm can retain the characteristic information of point cloud model better, and effectively avoid the phenomenon of holes in the simplification process. The simplified results have better smoothness, simplicity and precision, and are of high practical value.


Sign in / Sign up

Export Citation Format

Share Document