Improved argumentative paragraphs detection in academic theses supported with unit segmentation

2021 ◽  
pp. 1-11
Author(s):  
Jesús Miguel García-Gorrostieta ◽  
Aurelio López-López ◽  
Samuel González-López ◽  
Adrián Pastor López-Monroy

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Author(s):  
Cristián Castillo-Olea ◽  
Begonya Garcia-Zapirain Soto ◽  
Clemente Zuñiga

The article presents a study based on timeline data analysis of the level of sarcopenia in older patients in Baja California, Mexico. Information was examined at the beginning of the study (first event), three months later (second event), and six months later (third event). Sarcopenia is defined as the loss of muscle mass quality and strength. The study was conducted with 166 patients. A total of 65% were women and 35% were men. The mean age of the enrolled patients was 77.24 years. The research included 99 variables that consider medical history, pharmacology, psychological tests, comorbidity (Charlson), functional capacity (Barthel and Lawton), undernourishment (mini nutritional assessment (MNA) validated test), as well as biochemical and socio-demographic data. Our aim was to evaluate the prevalence of the level of sarcopenia in a population of chronically ill patients assessed at the Tijuana General Hospital. We used machine learning techniques to assess and identify the determining variables to focus on the patients’ evolution. The following classifiers were used: Support Vector Machines, Linear Support Vector Machines, Radial Basis Function, Gaussian process, Decision Tree, Random Forest, multilayer perceptron, AdaBoost, Gaussian Naive Bayes, and Quadratic Discriminant Analysis. In order of importance, we found that the following variables determine the level of sarcopenia: Age, Systolic arterial hypertension, mini nutritional assessment (MNA), Number of chronic diseases, and Sodium. They are therefore considered relevant in the decision-making process of choosing treatment or prevention. Analysis of the relationship between the presence of the variables and the classifiers used to measure sarcopenia revealed that the Decision Tree classifier, with the Age, Systolic arterial hypertension, MNA, Number of chronic diseases, and Sodium variables, showed a precision of 0.864, accuracy of 0.831, and an F1 score of 0.900 in the first and second events. Precision of 0.867, accuracy of 0.825, and an F1 score of 0.867 were obtained in event three with the same variables. We can therefore conclude that the Decision Tree classifier yields the best results for the assessment of the determining variables and suggests that the study population’s sarcopenia did not change from moderate to severe.


Deriving the methodologies to detect heart issues at an earlier stage and intimating the patient to improve their health. To resolve this problem, we will use Machine Learning techniques to predict the incidence at an earlier stage. We have a tendency to use sure parameters like age, sex, height, weight, case history, smoking and alcohol consumption and test like pressure ,cholesterol, diabetes, ECG, ECHO for prediction. In machine learning there are many algorithms which will be used to solve this issue. The algorithms include K-Nearest Neighbour, Support vector classifier, decision tree classifier, logistic regression and Random Forest classifier. Using these parameters and algorithms we need to predict whether or not the patient has heart disease or not and recommend the patient to improve his/her health.


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2013 ◽  
pp. 937-947
Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2016 ◽  
Vol 2016 ◽  
pp. 1-21 ◽  
Author(s):  
Taimur Bakhshi ◽  
Bogdan Ghita

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application throughk-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected tok-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.


Author(s):  
Angela More

Abstract: Data analytics play vital roles in diagnosis and treatment in the health care sector. To enable practitioner decisionmaking, huge volumes of data should be processed with machine learning techniques to produce tools for prediction and classification Breast Cancer reports 1 million cases per year. We have proposed a prediction model, which is specifically designed for prediction of Breast Cancer using Machine learning algorithms Decision tree classifier, Naïve Bayes, SVM and KNearest Neighbour algorithms. The model predicts the type of tumour, the tumour can be benign (noncancerous) or malignant (cancerous) . The model uses supervised learning which is a machine learning concept where we provide dependent and independent columns to machine. It uses classification technique which predicts the type of tumour. Keywords: Cancer, Machine learning, Prediction, Data Visualization, SVM, Naïve Bayes, Classification.


2019 ◽  
Vol 8 (4) ◽  
pp. 10316-10320

Nowadays, heart disease has become a major disease among the people irrespective of the age. We are seeing this even in children dying due to the heart disease. If we can predict this even before they die, there may be huge chances of surviving. Everybody has various qualities of beat rate (pulse rate) and circulatory strain (blood pressure). We are living in a period of data. Due to the rise in the technology, the amount of data that is generated is increasing daily. Some terabytes of data are being produced and stored. For example, the huge amount of data about the patients is produced in the hospitals such as chest pain, heart rate, blood pressure, pulse rate etc. If we can get this data and apply some machine learning techniques, we can reduce the probability of people dying. In this paper we have done survey using different classification and grouping strategies, for example, KNN, Decision tree classifier, Gaussian Naïve Bayes, Support vector machine, Linear regression, Logistic regression, Random forest classifier, Random forest regression, linear descriptive analysis. We have taken the 14 attributes that are present in the dataset as an input and applying on the dataset which is taken from the UCI repository to develop and accurate model of predicting the heart disease contains colossal (huge) therapeutic (medical) information. In the proposed research, the exhibition of the conclusion model is acquired by using utilizing classification strategies. In this paper proposed an accuracy model to predict whether a person has coronary disease or not. This is implemented by comparing the accuracies of different machine-learning strategies such as KNN, Decision tree classifier, Gaussian Naïve Bayes, SVM, Logistic regression, Random forest classifier, Linear regression, Random forest regression, linear descriptive analysis


2020 ◽  
Vol 17 (9) ◽  
pp. 4055-4060
Author(s):  
L. Girish ◽  
Sridhar K. N. Rao

Virtualized data centers bring lot of benefits with respect to the reducing the high usage of physical hardware. But nowadays, as the usage of cloud infrastructures are rapidly increasing in all the fields to provide proper services on demand. In cloud data center, achieving efficient resource sharing between virtual machine and physical machines are very important. To achieve efficient resource sharing performance degradation of virtual machine and quantifying the sensitivity of virtual machine must be modeled, predicted correctly. In this work we use machine learning techniques like decision tree, K nearest neighbor and logistic regression to calculate the sensitivity of virtual machine. The dataset used for the experiment was collected using collected from open stack cloud environment. We execute two scenarios in this experiment to evaluate performance of the three mentioned classifiers based on precision, recall, sensitivity and specificity. We achieved good results using decision tree classifier with precision 88.8%, recall 80% and accuracy of 97.30%.


2021 ◽  
Vol 102 ◽  
pp. 04004
Author(s):  
Jesse Jeremiah Tanimu ◽  
Mohamed Hamada ◽  
Mohammed Hassan ◽  
Saratu Yusuf Ilu

With the advent of new technologies in the medical field, huge amounts of cancerous data have been collected and are readily accessible to the medical research community. Over the years, researchers have employed advanced data mining and machine learning techniques to develop better models that can analyze datasets to extract the conceived patterns, ideas, and hidden knowledge. The mined information can be used as a support in decision making for diagnostic processes. These techniques, while being able to predict future outcomes of certain diseases effectively, can discover and identify patterns and relationships between them from complex datasets. In this research, a predictive model for predicting the outcome of patients’ cervical cancer results has been developed, given risk patterns from individual medical records and preliminary screening tests. This work presents a Decision tree (DT) classification algorithm and shows the advantage of feature selection approaches in the prediction of cervical cancer using recursive feature elimination technique for dimensionality reduction for improving the accuracy, sensitivity, and specificity of the model. The dataset employed here suffers from missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features and SMOTETomek has better results with an accuracy of 98%, sensitivity of 100%, and specificity of 97%. Decision Tree classifier is shown to have excellent performance in handling classification assignment when the features are reduced, and the problem of imbalance class is addressed.


Sign in / Sign up

Export Citation Format

Share Document