scholarly journals SPCS: A Spatial and Pattern Combined Smoothing Method of Spatial Transcriptomic Expression

2021 ◽  
Author(s):  
Yusong Liu ◽  
Tongxin Wang ◽  
Ben Duggan ◽  
Kun Huang ◽  
Jie Zhang ◽  
...  

The recently developed spatial transcriptomics (ST) technique has made it possible to view spatial transcriptional heterogeneity in a high throughput manner. It is based on highly multiplexed sequence analysis and uses barcodes to split the sequenced reads into respective tissue locations. However, this type of sequencing technique suffers from high noise and drop-out events in the data, which makes smoothing a necessary step before performing downstream analysis. Traditional smoothing methods used in the similar single cell RNA sequencing (scRNA-seq) data are one-factor methods that can only utilize associations in transcriptome space. Since they do not account for associations in the Euclidean space, i.e. tissue location distances on the ST slide, these one-factor methods cannot take full advantage of all the knowledge in ST data. In this study, we present a novel two-factor smoothing technique, Spatial and Pattern Combined Smoothing (SPCS), that employs k-nearest neighbor technique to utilize associations from transcriptome and Euclidean space from the ST data. By performing SPCS on 10 ST slides from pancreatic ductal adenocarcinoma (PDAC), smoothed ST slides have better separability, partition accuracy, and biological interpretability than the ones smoothed by pre-existing one-factor smoothing methods. Source code of SPCS is provided in Github (https://github.com/Usos/SPCS).

2018 ◽  
Vol 2 (1) ◽  
pp. 49-60
Author(s):  
Devi Andrian ◽  
Agus Mohamad Soleh ◽  
Hari Wijayanto

Graduate School IPB (SPs - IPB) has been established for a long time and is believed to produce high quality graduates and highly competitive. However, based on existing data recaps, there are a small number of students who did not graduate, either resigned or Drop Out (DO). It needs to be handled by conducting a selection process for prospective students based on the profile and educational background S1. One of them by applying the method of classification K - Nearest Neighbor (KNN). The response variable used is the success status of the study of prospective students, ie graduated and not graduated. While the explanatory variables used are the profiles and educational background of prospective students. There is an imbalance of data in the data obtained, where the class does not pass much less than the passing class. This can reduce the value of classification accuracy in minority class (sensitivity). So that the handling of data imbalance by using resampling method, either in the form of Random Over Sampling (ROS), Random Under Sampling (RUS), and Random Over-Under Sampling (ROUS). The result of comparison of evaluation result of KNN classification by using k = 1 to 6, resulted in greater sensitivity value when accompanied by the process of handling the data imbalance than without the process of handling the data imbalance, although the accuracy and specificity value becomes smaller. The greatest sensitivity value was obtained when applying the KNN classification method with k = 1, accompanied by the handling of data imbalance by the RUS method, with the mean and median sensitivity values of 0.89 and 0.90, respectively.


2020 ◽  
Vol 5 (2) ◽  
pp. 265-270 ◽  
Author(s):  
Agus Budiyantara ◽  
Irwansyah Irwansyah ◽  
Egi Prengki ◽  
Pandi Ahmad Pratama ◽  
Ninuk Wiliani

Private Universities (PTS) compete so tight in providing performance in producing quality graduates. In addition, the number of universities in Indonesia which counts a lot both PTN and PTS makes the higher competition between universities as well. So the university strives to improve quality and provide the best education for service recipients, namely students, where one of the problems if there are some students who are late graduating or not on time so that it becomes an obstacle to the progress of the college. Prediction of students graduating on time is needed by university management in determining preventive policies related to early prevention of Drop Out (DO) cases. This prediction aims to determine the academic factors that influence the period of study and build the best prediction model with Data Mining techniques. There are 11 attributes used for Data Mining Classification, namely NPM, Gender, Age, Department, Class, Occupation, Semester 1 Achievement Index, Semester 2 Achievement Index, Semester 3 Achievement Index, Semester 4 Achievement Index and Information as result attributes. From the results of evaluations and validations that have been carried out using the RapidMiner tools the accuracy of the Decision Tree (C4.5) method is 98.04% in the 3rd test. The accuracy of the Naïve Bayes Method is 96.00% in the 4th test. And the accuracy of the K-Nearest Neighbor Method (K-NN) of 90.00% in the second test.


2021 ◽  
Vol 3 (2) ◽  
pp. 140-148
Author(s):  
Hermanto Hermanto

Currently, the problem of college failure, its on-time graduation, and the factors that cause it is still an interesting research topic (C. Marquez-Vera, C. Romero and S. Ventura, 2011). This study compares three data mining classification algorithms namely Naive Bayes, Decision Tree and K-Nearest Neighbor to predict graduation and dropout risk for students to improve the quality of higher education and the most accurate algorithms to use Prepare graduation and dropout prediction Student studies. The best algorithm for predicting graduation and dropout is the decision tree with the best accuracy value of 99.15% with a training data ratio of 30%. Keyword : Data Mining; Algoritma Naive Bayes; Decision Tree; K-Nearest Neighbor; Predict Graduation; Drop Out.


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.


Author(s):  
S. Vijaya Rani ◽  
G. N. K. Suresh Babu

The illegal hackers  penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods  available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


Sign in / Sign up

Export Citation Format

Share Document