Plagiarsm Checker pada Sistem Manajemen  Data Tugas Akhir

Made Hanindia Prami Swari; Chrystia Aji Putra; I Putu Susila Handika

doi:10.34128/jsi.v7i2.338

Plagiarsm Checker pada Sistem Manajemen Data Tugas Akhir

Jurnal Sains dan Informatika ◽

10.34128/jsi.v7i2.338 ◽

2021 ◽

Vol 7 (2) ◽

pp. 192-201

Author(s):

Made Hanindia Prami Swari ◽

Chrystia Aji Putra ◽

I Putu Susila Handika

Keyword(s):

F Measure

Plagiarisme merupakan isu yang santer berkembang khususnya di Perguruan Tinggi. Perpustakaan di Fasilkom UPN “Veteran” Jawa Timur telah memiliki sistem pencatatan skripsi mahasiswa yang bernama E-Read, seringkali mahasiswa yang akan menempuh skripsi akan mencari ide topik dan literatur pada perpustakaan. Untuk meminimalkan kemungkinan terjadinya plagiasi maka sistem E-Read dilengkapi dengan fitur plagiarsm checker yang dibangun pada penelitian ini. Plagiarsm Checker dibuat melalui tahapan pengembangan terstruktur menggunakan metode Waterfall yang terdiri dari analisis kebutuhan, perancangan, implementasi, dan pengujian sistem. Algoritma Jaro Winkler dipilih untuk mendeteksi similaritas dokumen abstrak calon skripsi dengan abstrak-abstrak yang telah tersimpan pada basis data E-Read karena berdasarkan penelitian terdahulu dinyatakan memiliki akurasi yang baik dan waktu komputasi yang cepat. Berdasarkan pengujian terhadap akurasi sistem, maka didapatkan plagiarsm checker yang dibangun memiliki akurasi 40% jika dibandingkan dengan pendapat ahli, nilai presisi 33,3%, recall 100%, dan F-Measure 35%. Selain itu berdasarkan analisis terhadap proses komputasi algoritma Jaro Winkler maka didapatkan bahwa algoritma ini berfokus pada pencarian kesamaan karakter sesuai range uji yang ditentukan, hal ini berarti bahwa algoritma ini belum memiliki mekanisme pengecekan makna kata yang seharusnya diperlukan dalam sistem pengecekan plagiasi dokumen. Melalui penelitian ini, mahasiswa dapat melakukan pengecekan awal abstrak dari calon skripsinya untuk meminimalkan terjadinya plagiasi.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text

Rancang Bangun Aplikasi UMN Library Catalog Menggunakan Metode Rocchio Relevance Feedback

Jurnal ULTIMA InfoSys ◽

10.31937/si.v9i1.684 ◽

2018 ◽

Vol 9 (1) ◽

pp. 9-17

Author(s):

Marcel Bonar Kristanda ◽

Seng Hansun ◽

Albert Albert

Keyword(s):

Information System ◽

User Experience ◽

Relevance Feedback ◽

Android Application ◽

Library Catalog ◽

Relevant Result ◽

User Query ◽

Index Terms ◽

Feedback Method ◽

F Measure

Library catalog is a documentation or list of all library collections. Unfortunately, there is a problem identified in the process of searching a book inside library catalog in Universitas Multimedia Nusantara’s library information system regarding the relevant result based on user query input. This research aims to design and build a library catalog application on Android platform in order to increase the relvancy of searching result in a database using calculated Rocchio Relevance Feedback method along with user experience measurement. User experience analysis result presented a good respond with 91.18% score based by all factor and relevance value present 71.43% precision, 100% recall, and 83.33% F-Measure. Differences of relevant results between the Senayan Library Information system (SLiMS) and the new Android application ranged at 36.11%. Therefore, this Android application proved to give relevant result based on relevance rank. Index Terms—Rocchio, Relevance, Feedback, Pencarian, Buku, Aplikasi, Android, Perpustakaan.

Download Full-text

ICD10Net: An Artificial Intelligence Algorithm with Medical Background Conducts ICD-10-CM Coding Task with Outstanding Performance (Preprint)

10.2196/preprints.13677 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

Ding-Chung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

General Hospital ◽

Pearson Correlation ◽

Model Performance ◽

International Classification Of Diseases ◽

Free Text ◽

Daily Work ◽

Medical Background ◽

Icd 10 ◽

F Measure

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

A Frequency Pattern Mining Model Based on Deep Neural Network for Real-Time Classification of Heart Conditions

Healthcare ◽

10.3390/healthcare8030234 ◽

2020 ◽

Vol 8 (3) ◽

pp. 234 ◽

Cited By ~ 3

Author(s):

Hyun Yoo ◽

Soyoung Han ◽

Kyungyong Chung

Keyword(s):

Neural Network ◽

Big Data ◽

Fourier Transform ◽

Fast Fourier Transform ◽

Real Time ◽

Normal Control ◽

Input Data ◽

Deep Neural Network ◽

Pattern Mining ◽

F Measure

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.

Download Full-text

Extensible artificial intelligence model predicts post-ablation AF recurrence using coronary sinus electrogram

European Heart Journal ◽

10.1093/ehjci/ehaa946.0560 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

D Huang ◽

Z Zhang ◽

K Lin ◽

Z Zuo ◽

Q Chen ◽

...

Keyword(s):

Artificial Intelligence ◽

Catheter Ablation ◽

Coronary Sinus ◽

Public Health Problem ◽

Adverse Outcomes ◽

Funding Source ◽

Major Public Health Problem ◽

Deep Convolutional Neural Networks ◽

F Measure ◽

Af Recurrence

Abstract Background Atrial fibrillation (AF) is a major public health problem with significant adverse outcomes and catheter ablation is a widely adopted treatment. The CABANA trial showed that catheter ablation reduced AF recurrence to a greater extent than medications. However, some of patients who underwent this procedure still experience relapse. Here, we present an innovative way to identify this subgroup using an artificial intelligence (AI) -assisted coronary sinus electrogram. Hypothesis Our hypothesis is that credible features in the electrogram can be extracted by AI for prediction, therefore rigorous drug administration, close follow-up or potential second procedure can be applied to these patients. Methods 67 patients from two independent hospitals (SPH & ZSH) with non-valvular persistent AF undergoing circumferential pulmonary vein isolation were enrolled in this study, 23 of which experienced recurrence 6 months after the procedure. We collected standard 2.5-second fragments of coronary sinus electrogram from ENSITE NAVX (SPH) and Carto (ZSH)system before the ablation started. A total of 1429 fragments were obtained and a transfer learning-based ResNet model was employed in our study. Fragments from ZSH were used for training and SPH for validation of deep convolutional neural networks (DCNN). The AI model performance was evaluated by accuracy, recall, precision, F-Measure and AUC. Results The prediction accuracy of the DCNN in single center reached 96%, while that in different ablation systems reached 74.3%. Also, the algorithm yielded values for the AUC, recall, precision and F-Measure of 0.76, 86.1%, 95.9% and 0.78, respectively, which shows satisfactory classification results and extensibility in different cardiology centers and brands of electroanatomic mapping instruments. Conclusions Our work has revealed the potential intrinsic correlation between coronary sinus electrical activity and AF recurrence using DCNN-based model. Moreover, the DCNN model we developed shows great prospects in the relapse prediction for personalized post-procedural management. Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): The National Natural Science Foundation of China

Download Full-text

Trust and fuzzy inference based cross domain serendipitous item recommendations (TFCDSRS)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189872 ◽

2021 ◽

pp. 1-13

Author(s):

Richa ◽

Punam Bedi

Keyword(s):

Fuzzy Inference ◽

Mean Absolute Error ◽

Information Filtering ◽

Absolute Error ◽

User Acceptance ◽

Decision Making Process ◽

End User ◽

Cross Domain ◽

F Measure ◽

Filtering Approach

Recommender System (RS) is an information filtering approach that helps the overburdened user with information in his decision making process and suggests items which might be interesting to him. While presenting recommendation to the user, accuracy of the presented list is always a concern for the researchers. However, in recent years, the focus has now shifted to include the unexpectedness and novel items in the list along with accuracy of the recommended items. To increase the user acceptance, it is important to provide potentially interesting items which are not so obvious and different from the items that the end user has rated. In this work, we have proposed a model that generates serendipitous item recommendation and also takes care of accuracy as well as the sparsity issues. Literature suggests that there are various components that help to achieve the objective of serendipitous recommendations. In this paper, fuzzy inference based approach is used for the serendipity computation because the definitions of the components overlap. Moreover, to improve the accuracy and sparsity issues in the recommendation process, cross domain and trust based approaches are incorporated. A prototype of the system is developed for the tourism domain and the performance is measured using mean absolute error (MAE), root mean square error (RMSE), unexpectedness, precision, recall and F-measure.

Download Full-text

An efficient stock market prediction model using hybrid feature reduction method based on variational autoencoders and recursive feature elimination

Financial Innovation ◽

10.1186/s40854-021-00243-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Hakan Gunduz

Keyword(s):

Reduction Method ◽

Feature Reduction ◽

The Other ◽

Recursive Feature Elimination ◽

Accuracy Rate ◽

Stock Performance ◽

Attention Model ◽

Feature Sets ◽

Accuracy Rates ◽

F Measure

AbstractIn this study, the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based, deep-learning (LSTM) and ensemble learning (LightGBM) models. These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics. While the first experiments directly used the own stock features as the model inputs, the second experiments utilized reduced stock features through Variational AutoEncoders (VAE). In the last experiments, in order to grasp the effects of the other banking stocks on individual stock performance, the features belonging to other stocks were also given as inputs to our models. While combining other stock features was done for both own (named as allstock_own) and VAE-reduced (named as allstock_VAE) stock features, the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination. As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model, the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675. Although the classification results achieved with both feature types was close, allstock_VAE achieved these results using nearly 16.67% less features compared to allstock_own. When all experimental results were examined, it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features. It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.

Download Full-text

Development and Evaluation of Computable Phenotypes in Pediatric Epilepsy:3 Cases

Journal of Child Neurology ◽

10.1177/08830738211019578 ◽

2021 ◽

pp. 088307382110195

Author(s):

Sabrina Pan ◽

Alan Wu ◽

Mark Weiner ◽

Zachary M Grinspan

Keyword(s):

Positive Predictive Value ◽

Predictive Value ◽

Juvenile Myoclonic Epilepsy ◽

Myoclonic Epilepsy ◽

Hypoxic Ischemic Encephalopathy ◽

Pediatric Epilepsy ◽

Health Record ◽

Treatment Resistant ◽

Ischemic Encephalopathy ◽

F Measure

Introduction: Computable phenotypes allow identification of well-defined patient cohorts from electronic health record data. Little is known about the accuracy of diagnostic codes for important clinical concepts in pediatric epilepsy, such as (1) risk factors like neonatal hypoxic-ischemic encephalopathy; (2) clinical concepts like treatment resistance; (3) and syndromes like juvenile myoclonic epilepsy. We developed and evaluated the performance of computable phenotypes for these examples using electronic health record data at one center. Methods: We identified gold standard cohorts for neonatal hypoxic-ischemic encephalopathy, pediatric treatment-resistant epilepsy, and juvenile myoclonic epilepsy via existing registries and review of clinical notes. From the electronic health record, we extracted diagnostic and procedure codes for all children with a diagnosis of epilepsy and seizures. We used these codes to develop computable phenotypes and evaluated by sensitivity, positive predictive value, and the F-measure. Results: For neonatal hypoxic-ischemic encephalopathy, the best-performing computable phenotype (HIE ICD-9 /10 and [brain magnetic resonance imaging (MRI) or electroencephalography (EEG) within 120 days of life] and absence of commonly miscoded conditions) had high sensitivity (95.7%, 95% confidence interval [CI] 85-99), positive predictive value (100%, 95% CI 95-100), and F measure (0.98). For treatment-resistant epilepsy, the best-performing computable phenotype (3 or more antiseizure medicines in the last 2 years or treatment-resistant ICD-10) had a sensitivity of 86.9% (95% CI 79-93), positive predictive value of 69.6% (95% CI 60-79), and F-measure of 0.77. For juvenile myoclonic epilepsy, the best performing computable phenotype (JME ICD-10) had poor sensitivity (52%, 95% CI 43-60) but high positive predictive value (90.4%, 95% CI 81-96); the F measure was 0.66. Conclusion: The variable accuracy of our computable phenotypes (hypoxic-ischemic encephalopathy high, treatment resistance medium, and juvenile myoclonic epilepsy low) demonstrates the heterogeneity of success using administrative data to identify cohorts important for pediatric epilepsy research.

Download Full-text

Early Detection of Red Palm Weevil, Rhynchophorus ferrugineus (Olivier), Infestation Using Data Mining

Plants ◽

10.3390/plants10010095 ◽

2021 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Heba Kurdi ◽

Amal Al-Aldawsari ◽

Isra Al-Turaiki ◽

Abdulrahman S. Aldawood

Keyword(s):

Data Mining ◽

Plant Size ◽

Support Vector ◽

Classification Algorithms ◽

Palm Tree ◽

Rhynchophorus Ferrugineus ◽

Red Palm Weevil ◽

Palm Weevil ◽

Using Data ◽

F Measure

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.

Download Full-text