Learning to see the wood for the trees: machine learning, decision trees, and the classification of isolated theropod teeth

Palaeontology ◽  
2020 ◽  
Author(s):  
Simon Wills ◽  
Charlie J. Underwood ◽  
Paul M. Barrett
Author(s):  
Charles X. Ling ◽  
John J. Parry ◽  
Handong Wang

Nearest Neighbour (NN) learning algorithms utilize a distance function to determine the classification of testing examples. The attribute weights in the distance function should be set appropriately. We study situations where a simple approach of setting attribute weights using decision trees does not work well, and design three improvements. We test these new methods thoroughly using artificially generated datasets and datasets from the machine learning repository.


2019 ◽  
Author(s):  
Wilson Castro ◽  
Jimy Oblitas ◽  
Miguel De-la-Torre ◽  
Carlos Cotrina ◽  
Karen Bazán ◽  
...  

The classification of fresh fruits according to their ripeness is typically a subjective and tedious task; consequently, there is growing interest in the use of non-contact techniques such as those based on computer vision and machine learning. In this paper, we propose the use of non-intrusive techniques for the classification of Cape gooseberry fruits. The proposal is based on the use of machine learning techniques combined with different color spaces. Given the success of techniques such as artificial neural networks,support vector machines, decision trees, and K-nearest neighbors in addressing classification problems, we decided to use these approaches in this research work. A sample of 926 Cape gooseberry fruits was obtained, and fruits were classified manually according to their level of ripeness into seven different classes. Images of each fruit were acquired in the RGB format through a system developed for this purpose. These images were preprocessed, filtered and segmented until the fruits were identified. For each piece of fruit, the median color parameter values in the RGB space were obtained, and these results were subsequently transformed into the HSV and L*a*b* color spaces. The values of each piece of fruit in the three color spaces and their corresponding degrees of ripeness were arranged for use in the creation, testing, and comparison of the developed classification models. The classification of gooseberry fruits by ripening level was found to be sensitive to both the color space used and the classification technique, e.g., the models based on decision trees are the most accurate, and the models based on the L*a*b* color space obtain the best mean accuracy. However, the model that best classifies the cape gooseberry fruits based on ripeness level is that resulting from the combination of the SVM technique and the RGB color space.


Author(s):  
Wilson Castro ◽  
Jimy Oblitas ◽  
Miguel De-la-Torre ◽  
Carlos Cotrina ◽  
Karen Bazán ◽  
...  

The classification of fresh fruits according to their ripeness is typically a subjective and tedious task; consequently, there is growing interest in the use of non-contact techniques such as those based on computer vision and machine learning. In this paper, we propose the use of non-intrusive techniques for the classification of Cape gooseberry fruits. The proposal is based on the use of machine learning techniques combined with different color spaces. Given the success of techniques such as artificial neural networks,support vector machines, decision trees, and K-nearest neighbors in addressing classification problems, we decided to use these approaches in this research work. A sample of 926 Cape gooseberry fruits was obtained, and fruits were classified manually according to their level of ripeness into seven different classes. Images of each fruit were acquired in the RGB format through a system developed for this purpose. These images were preprocessed, filtered and segmented until the fruits were identified. For each piece of fruit, the median color parameter values in the RGB space were obtained, and these results were subsequently transformed into the HSV and L*a*b* color spaces. The values of each piece of fruit in the three color spaces and their corresponding degrees of ripeness were arranged for use in the creation, testing, and comparison of the developed classification models. The classification of gooseberry fruits by ripening level was found to be sensitive to both the color space used and the classification technique, e.g., the models based on decision trees are the most accurate, and the models based on the L*a*b* color space obtain the best mean accuracy. However, the model that best classifies the cape gooseberry fruits based on ripeness level is that resulting from the combination of the SVM technique and the RGB color space.


Author(s):  
Oyelakin A. M ◽  
Alimi O. M ◽  
Mustapha I. O ◽  
Ajiboye I. K

Phishing attacks have been used in different ways to harvest the confidential information of unsuspecting internet users. To stem the tide of phishing-based attacks, several machine learning techniques have been proposed in the past. However, fewer studies have considered investigating single and ensemble machine learning-based models for the classification of phishing attacks. This study carried out performance analysis of selected single and ensemble machine learning (ML) classifiers in phishing classification.The focus is to investigate how these algorithms behave in the classification of phishing attacks in the chosen dataset. Logistic Regression and Decision Trees were chosen as single learning classifiers while simple voting techniques and Random Forest were used as the ensemble machine learning algorithms. Accuracy, Precision, Recall and F1-score were used as performance metrics. Logistic Regression algorithm recorded 0.86 as accuracy, 0.89 as precision, 0.87 as recall and 0.81 as F1-score. Similarly, the Decision Trees classifier achieved an accuracy of 0.87, 0.83 for precision, 0.88 for recall and 0.81 for F1-score. In the voting ensemble, accuracy of 0.92 was achieved. 0.90 was obtained for precision, 0.92 for recall and 0.92 for F1-score. Random Forest algorithm recorded 0.98, 0.97, 0.98 and 0.97 as accuracy, precision, recall and F1-score respectively. From the experimental analyses, Random Forest algorithm outperformed simple averaging classifier and the two single algorithms used for phishing url detection. The study established that the ensemble techniques that were used for the experimentations are more efficient for phishing url identification compared to the single classifiers.  


Author(s):  
Carlos Cotrina ◽  
Karen Bazán ◽  
Jimy Oblitas ◽  
Himer Avila-George ◽  
Wilson Castro

The classification of fresh fruits according to their ripeness is typically a subjective and tedious task; consequently, there is growing interest in the use of non-contact techniques such as those based on computer vision and machine learning. In this paper, we propose the use of non-intrusive techniques for the classification of Cape gooseberry fruits. The proposal is based on the use of machine learning techniques combined with different color spaces. Given the success of techniques such as artificial neural networks,support vector machines, decision trees, and K-nearest neighbors in addressing classification problems, we decided to use these approaches in this research work. A sample of 926 Cape gooseberry fruits was obtained, and fruits were classified manually according to their level of ripeness into seven different classes. Images of each fruit were acquired in the RGB format through a system developed for this purpose. These images were preprocessed, filtered and segmented until the fruits were identified. For each piece of fruit, the median color parameter values in the RGB space were obtained, and these results were subsequently transformed into the HSV and L*a*b* color spaces. The values of each piece of fruit in the three color spaces and their corresponding degrees of ripeness were arranged for use in the creation, testing, and comparison of the developed classification models. The classification of gooseberry fruits by ripening level was found to be sensitive to both the color space used and the classification technique, e.g., the models based on decision trees are the most accurate, and the models based on the L*a*b* color space obtain the best mean accuracy. However, the model that best classifies the cape gooseberry fruits based on ripeness level is that resulting from the combination of the SVM technique and the RGB color space.


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


Sign in / Sign up

Export Citation Format

Share Document