scholarly journals Text Mining for Multiclass Research Paper Categorization

A research paper is a rich source of academic and innovative writing on a particular topic, and they are unstructured in nature. Categorization of documents refers to classification of documents in classes that are predefined. It is arduous for a user to categories research paper in different domains: because extracting meaningful and relevant words from the research paper is a challenging task. For extracting important information we have used certain methods and classifiers. Methods like bag of words and tfidf is used for processing data. Prepossessing the data includes string tokenizing and stop-word removal. Then the processed data is classified using SVM classifier. For multiclass classification; since predefined classes are 4, therefore 1-v-r classifier is used. The system performance is 88% with 800 training and 200 testing documents. It is analyzed that the model performs better when the training data is more. The aim of this work is to categorize the documents and allocate set of predefined tag to them. It also evaluates the performance of the model by considering different percentages for training and testing sets of documents.

2019 ◽  
Vol 26 (11) ◽  
pp. 1286-1296 ◽  
Author(s):  
Li Tong ◽  
Hang Wu ◽  
May D Wang

Abstract Objective This article presents a novel method of semisupervised learning using convolutional autoencoders for optical endomicroscopic images. Optical endomicroscopy (OE) is a newly emerged biomedical imaging modality that can support real-time clinical decisions for the grade of dysplasia. To enable real-time decision making, computer-aided diagnosis (CAD) is essential for its high speed and objectivity. However, traditional supervised CAD requires a large amount of training data. Compared with the limited number of labeled images, we can collect a larger number of unlabeled images. To utilize these unlabeled images, we have developed a Convolutional AutoEncoder based Semi-supervised Network (CAESNet) for improving the classification performance. Materials and Methods We applied our method to an OE dataset collected from patients undergoing endoscope-based confocal laser endomicroscopy procedures for Barrett’s esophagus at Emory Hospital, which consists of 429 labeled images and 2826 unlabeled images. Our CAESNet consists of an encoder with 5 convolutional layers, a decoder with 5 transposed convolutional layers, and a classification network with 2 fully connected layers and a softmax layer. In the unsupervised stage, we first update the encoder and decoder with both labeled and unlabeled images to learn an efficient feature representation. In the supervised stage, we further update the encoder and the classification network with only labeled images for multiclass classification of the OE images. Results Our proposed semisupervised method CAESNet achieves the best average performance for multiclass classification of OE images, which surpasses the performance of supervised methods including standard convolutional networks and convolutional autoencoder network. Conclusions Our semisupervised CAESNet can efficiently utilize the unlabeled OE images, which improves the diagnosis and decision making for patients with Barrett’s esophagus.


2021 ◽  
Vol 8 (11) ◽  
pp. 325-331
Author(s):  
Eko Hariyanto ◽  
Sri Wahyuni ◽  
Supina Batubara

The main problem studied in this study is the large number of lost students who harm universities because of the difficulty of monitoring or monitoring as a preventive measure. Therefore, this research becomes very important to be done so that college institutions can make efforts to detect early (classification) of students who potentially cannot complete their studies on time or students who will drop out (DO). Thus, PT institutions through related parties such as academic guidance lecturers, academic bureaus and others can do initial prevention by providing the best solution or solution to the problems faced by students. This research aims to determine the training data model consisting of academic and non-academic factors (including the results of extracting information from social media). Furthermore, this model is used as a basis for classifying students who have the potential to "graduate on time", "graduate not on time", and "DO". The method approach used is quantitative with text mining computational algorithms for the process of extracting knowledge / information from social media which is further used in data training, as well as data mining computational algorithms for the process of classification of potential completion of student studies. The mandatory external targeted in the first year is the publication of the international journal Scopus Q4 and in the second year is the publication of the international journal Scopus Q3. For additional external targets in the first and second years respectively are the publication of international journals indexed on reputable indexers, ISBN teaching books and copyrights. The level of technological readiness (TKT) in this study up to level 2 is the formulation of technological concepts and applications to classify the potential completion of student studies using data mining. Keywords: [student lost, knowledge/information extraction, data classification, text mining, data mining].


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2503
Author(s):  
Taro Suzuki ◽  
Yoshiharu Amano

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.


Author(s):  
Christian Horn ◽  
Oscar Ivarsson ◽  
Cecilia Lindhé ◽  
Rich Potter ◽  
Ashely Green ◽  
...  

AbstractRock art carvings, which are best described as petroglyphs, were produced by removing parts of the rock surface to create a negative relief. This tradition was particularly strong during the Nordic Bronze Age (1700–550 BC) in southern Scandinavia with over 20,000 boats and thousands of humans, animals, wagons, etc. This vivid and highly engaging material provides quantitative data of high potential to understand Bronze Age social structures and ideologies. The ability to provide the technically best possible documentation and to automate identification and classification of images would help to take full advantage of the research potential of petroglyphs in southern Scandinavia and elsewhere. We, therefore, attempted to train a model that locates and classifies image objects using faster region-based convolutional neural network (Faster-RCNN) based on data produced by a novel method to improve visualizing the content of 3D documentations. A newly created layer of 3D rock art documentation provides the best data currently available and has reduced inscribed bias compared to older methods. Several models were trained based on input images annotated with bounding boxes produced with different parameters to find the best solution. The data included 4305 individual images in 408 scans of rock art sites. To enhance the models and enrich the training data, we used data augmentation and transfer learning. The successful models perform exceptionally well on boats and circles, as well as with human figures and wheels. This work was an interdisciplinary undertaking which led to important reflections about archaeology, digital humanities, and artificial intelligence. The reflections and the success represented by the trained models open novel avenues for future research on rock art.


Sign in / Sign up

Export Citation Format

Share Document