scholarly journals Affinity Scores: An Individual-centric Fingerprinting Framework for Neuropsychiatric Disorders

2021 ◽  
Author(s):  
Cassandra M J Wannan ◽  
Christos Pantelis ◽  
Antonia Merritt ◽  
Bruce Tonge ◽  
Warda T Syeda

Background: Population-centric frameworks of biomarker identification for psychiatric disorders focus primarily on comparing averages between groups and assume that diagnostic groups are (1) mutually-exclusive, and (2) homogeneous. There is a paucity of individual-centric approaches capable of identifying individual-specific fingerprints across multiple domains. To address this, we propose a novel framework, combining a range of biopsychosocial markers, including brain structure, cognition, and clinical markers, into higher-level fingerprints, capable of capturing intra-illness heterogeneity and inter-illness overlap. Methods: A multivariate framework was implemented to identify individualised patterns of brain structure, cognition and clinical markers based on affinity to other participants in the database. First, individual-level affinity scores defined a neighbourhood for each participant across each measure based on variable-specific hop sizes. Next, diagnostic verification and classification algorithms were implemented based on multivariate affinity score profiles. To perform affinity-based classification, data were divided into training and test samples, and 5-fold nested cross-validation was performed on the training data. Affinity-based classification was compared to weighted K-nearest neighbours (KNN) classification. K-means clustering was used to create clusters based on multivariate affinity score profiles. The framework was applied to the Australian Schizophrenia Research Bank (ASRB) dataset. Results: Individualised affinity scores provided a fingerprint of brain structure, cognition, and clinical markers, which described the affinity of an individual to the representative groups in the dataset Diagnostic verification capability was moderate to high depending on the choice of multivariate affinity metric. Affinity score-based classification achieved a high degree of accuracy in the training, nested cross-validation and prediction steps, and outperformed KNN classification in the training and test datasets. Conclusion: Affinity scores demonstrate utility in two keys ways: (1) Early and accurate diagnosis of neuropsychiatric disorders, whereby an individual can be grouped within a diagnostic category/ies that best matches their fingerprint, and (2) identification of biopsychosocial factors that most strongly characterise individuals/disorders, and which may be most amenable to intervention.

2021 ◽  
Vol 4 ◽  
Author(s):  
Michael Platzer ◽  
Thomas Reutterer

AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, which provide a model-free and easy-to-communicate empirical metric for the representativeness of a synthetic dataset. Privacy risk is assessed by calculating the individual-level distances to closest record with respect to the training data. By showing that the synthetic samples are just as close to the training as to the holdout data, we yield strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records. We empirically demonstrate the presented framework for seven distinct synthetic data solutions across four mixed-type datasets and compare these then to traditional data perturbation techniques. Both a Python-based implementation of the proposed metrics and the demonstration study setup is made available open-source. The results highlight the need to systematically assess the fidelity just as well as the privacy of these emerging class of synthetic data generators.


2021 ◽  
Author(s):  
Lianteng Song ◽  
◽  
Zhonghua Liu ◽  
Chaoliu Li ◽  
Congqian Ning ◽  
...  

Geomechanical properties are essential for safe drilling, successful completion, and exploration of both conven-tional and unconventional reservoirs, e.g. deep shale gas and shale oil. Typically, these properties could be calcu-lated from sonic logs. However, in shale reservoirs, it is time-consuming and challenging to obtain reliable log-ging data due to borehole complexity and lacking of in-formation, which often results in log deficiency and high recovery cost of incomplete datasets. In this work, we propose the bidirectional long short-term memory (BiL-STM) which is a supervised neural network algorithm that has been widely used in sequential data-based pre-diction to estimate geomechanical parameters. The pre-diction from log data can be conducted from two differ-ent aspects. 1) Single-Well prediction, the log data from a single well is divided into training data and testing data for cross validation; 2) Cross-Well prediction, a group of wells from the same geographical region are divided into training set and testing set for cross validation, as well. The logs used in this work were collected from 11 wells from Jimusaer Shale, which includes gamma ray, bulk density, resistivity, and etc. We employed 5 vari-ous machine learning algorithms for comparison, among which BiLSTM showed the best performance with an R-squared of more than 90% and an RMSE of less than 10. The predicted results can be directly used to calcu-late geomechanical properties, of which accuracy is also improved in contrast to conventional methods.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S952-S952
Author(s):  
Jessie Alwerdt ◽  
Yuan Tian ◽  
Andrew D Patterson ◽  
Martin Sliwinski

Abstract Prior work has suggested that metabolic disorders increase the risk for cognitive decline. Further, studies have identified amino acids (AAs) as potential biomarkers for dementia and diabetes. This study examines AAs and metabolic clinical markers (MCM) as predictors of cognition (Processing Speed (SOP), Working Memory (WM), Fluid (Gf) and Crystallized Intelligence (Gc)). The sample included 241 middle-aged adults from Bronx, NY. Predictors included age, gender, education, ethnicity, smoking, having diabetes, glucose, insulin, triglycerides, diastolic, and systolic blood pressure (BP), and cholesterol. AAs and associated derivatives were obtained from serum using NMR-based metabolomics. Analyses were conducted for each cognitive domain using repeated cross-validation random forests and lasso regressions. Overall, all models had acceptable cross-validation mean squared error except for WM. Several MCMs were specific to each cognitive domain, such as lower triglycerides and glucose associated with better SOP and higher systolic BP associated with better Gc while none were identified for Gf. The Gf model had the least number of AAs with lower serine associated with better FI. Two AAs, higher histidine and alanine, were associated with better SOP. Further, higher alanine, valine, isoleucine, serine, methionine, betaine, and moderate tyrosine were associated with better Gc. These results indicate that AAs were specific to each cognitive domain and ranked similar or higher in importance as several MCMs These results suggest that further investigation of AAs alongside associated MCMs is needed to assess the metabolic contribution to cognitive performance. Such research will help identify specific metabolic targets relating to cognition.


2020 ◽  
Vol 34 (01) ◽  
pp. 1153-1160 ◽  
Author(s):  
Xinshi Zang ◽  
Huaxiu Yao ◽  
Guanjie Zheng ◽  
Nan Xu ◽  
Kai Xu ◽  
...  

Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e.g., traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based meta-reinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves the-state-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.


Author(s):  
Xiao Chen ◽  
Ning Wang

For characterization or optimization process, a computer prediction model is in demand. This paper describes an approach for modeling a delayed coking process using generalized regression neural network (GRNN) and a double-chain based DNA genetic algorithm (dc-DNAGA). In GRNN, the smoothing parameters have significant effect on the performance of the network. This paper presents an improved GA, dc-DNAGA, to optimize the smoothing parameters in GRNN. The dc-DNAGA is inspired by the biological DNA, where the smoothing parameters are coded in the double-chain chromosomes and modified genetic operators are employed to improve the global search ability of GA. To test the performance of the constructed model, it is used to predict the output of the test data which is not included in the training data. Compared with other reported methods, eight cross validation results show the advantage of the proposed technique that it predicts the new data more accurately.


2021 ◽  
Vol 12 (2) ◽  
pp. 91
Author(s):  
Zilvanhisna Emka Fitri ◽  
Lalitya Nindita Sahenda ◽  
Pramuditha Shinta Dewi Puspitasari ◽  
Prawidya Destarianto ◽  
Dyah Laksito Rukmi ◽  
...  

Acute Respiratory Infection (ARI) is an infectious disease. One of the performance indicators of infectious disease control and handling programs is disease discovery. However, the problem that often occurs is the limited number of medical analysts, the number of patients, and the experience of medical analysts in identifying bacterial processes so that the examination is relatively longer. Based on these problems, an automatic and accurate classification system of bacteria that causes Acute Respiratory Infection (ARI) was created. The research process is preprocessing images (color conversion and contrast stretching), segmentation, feature extraction, and KNN classification. The parameters used are bacterial count, area, perimeter, and shape factor. The best training data and test data comparison is 90%: 10% of 480 data. The KNN classification method is very good for classifying bacteria. The highest level of accuracy is 91.67%, precision is 92.4%, and recall is 91.7% with three variations of K values, namely K = 3, K = 5, and K = 7.


Repositor ◽  
2020 ◽  
Vol 2 (8) ◽  
Author(s):  
Nabillah Annisa Rahmayanti ◽  
Yufis Azhar ◽  
Gita Indah Marthasari

AbstrakBullying sering terjadi pada anak-anak khususnya remaja dan meresahkan para orang tua. Maraknya kasus bullying di negeri ini bahkan sampai menyebabkan korban jiwa. Hal ini dapat dicegah dengan cara mengetahui gejala-gejala seorang anak yang mengalami bullying. Kondisi seorang anak yang tidak dapat mengungkapkan keluh kesahnya, tentu membuat orang tua dan juga guru di sekolah sukar dalam mengerti apa yang sedang menimpanya. Hal tersebut bisa saja dikarenakan anak sedang mengalami tindakan bullying oleh teman-temannya. Oleh karena itu peneliti memiliki tujuan untuk menghasilkan fitur yang telah terseleksi dengan menggunakan algoritma C5.0. Sehingga dengan menggunakan fitur yang telah terseleksi dapat meringankan pekerjaan dalam mengisi kuisioner dan juga mempersingkat waktu dalam menentukan seorang anak apakah terkena bullying atau tidak berdasarkan gejala yang ada di setiap pertanyaan pada kuisioner. Untuk menunjang data dalam penelitian ini, peneliti menggunakan kuisioner untuk mendapatkan jawaban dari pertanyaan yang berisi tentang gejala anak yang menjadi korban bullying. Jawaban dari responden akan diolah menjadi kumpulan data yang nantinya akan dibagi menjadi data latih dan data uji untuk selanjutnya diteliti dengan menggunakan Algoritma C5.0. Metode evaluasi yang digunakan pada penelitian ini yaitu 10 fold cross validation dan untuk menilai akurasi menggunakan confusion matrix. Penelitian ini juga melaukan perbandingan dengan beberapa algoritma klasifikasi lainnya yaitu Naive Bayes dan KNN yang bertujuan untuk melhat seberapa akurat algoritma C5.0 dalam melakukan seleksi fitur. Hasil pengujian menunjukkan bahwa algoritma C5.0 mampu melakukan seleksi fitur dan juga memiliki tingkat akurasi yang lebih baik jika dibandingkan dengan algoritma Naive Bayes dan KNN dengan hasil akurasi sebelum menggunakan seleksi fitur sebesar 92,77% dan setelah menggunakan seleksi fitur sebesar 93,33%. Abstract Bullying often occurs in children, especially teenagers and unsettles parents. The rise of cases of bullying in this country even caused casualties. This can be prevented by knowing the symptoms of a child who has bullying. The condition of a child who cannot express his complaints, certainly makes parents and teachers at school difficult to understand what is happening to them. This could be because the child is experiencing bullying by his friends. Therefore, researchers have a goal to produce selected features using the C5.0 algorithm. So using the selected features can ease the work in filling out questionnaires and also shorten the time in determining whether a child is exposed to bullying or not based on the symptoms in each question in the questionnaire. To support the data in this study, the researcher used a questionnaire to get answers to questions that contained the symptoms of children who were victims of bullying. The answer from the respondent will be processed into a data collection which will later be divided into training data and test data for further research using the C5.0 Algorithm. The evaluation method used in this study is 10 fold cross validation and to assess accuracy using confusion matrix. This study also carried out a comparison with several other classification algorithms, namely Naive Bayes and KNN which aimed to see how accurate the C5.0 algorithm was in feature selection. The test results show that the C5.0 algorithm is capable of feature selection and also has a better accuracy compared to the Naive Bayes and KNN algorithms with accuracy results before using feature selection of 92.77% and after using feature selection of 93.33%


2020 ◽  
Author(s):  
Rich Colbaugh ◽  
Kristin Glass

AbstractThere is great interest in personalized medicine, in which treatment is tailored to the individual characteristics of patients. Achieving the objectives of precision healthcare will require clinically-grounded, evidence-based approaches, which in turn demands rigorous, scalable predictive analytics. Standard strategies for deriving prediction models for medicine involve acquiring ‘training’ data for large numbers of patients, labeling each patient according to the outcome of interest, and then using the labeled examples to learn to predict the outcome for new patients. Unfortunately, labeling individuals is time-consuming and expertise-intensive in medical applications and thus represents a major impediment to practical personalized medicine. We overcome this obstacle with a novel machine learning algorithm that enables individual-level prediction models to be induced from aggregate-level labeled data, which is readily-available in many health domains. The utility of the proposed learning methodology is demonstrated by: i.) leveraging US county-level mental health statistics to create a screening tool which detects individuals suffering from depression based upon their Twitter activity; ii.) designing a decision-support system that exploits aggregate clinical trials data on multiple sclerosis (MS) treatment to predict which therapy would work best for the presenting patient; iii.) employing group-level clinical trials data to induce a model able to find those MS patients likely to be helped by an experimental therapy.


2018 ◽  
Vol 1 (2) ◽  
pp. 70-75
Author(s):  
Abdul Rozaq

Building materials is an important factor to built a house, to estimate funds the needs of build a house, consumers or developers can estimate the funds needed to build a house. To solve these problems use case base reasoning (CBR) approach, which method is capable of reasoning or solving the problem based on the cases that have been there as a solution to new problems. The system built in this study is a CBR system for determine the needs of house building materials. The consultation process is done by inserting new cases compared to the old case similarity value is then calculated using the nearest neighbor. The first test by inserting test data then compared with each type of home then obtained an accuracy of 83.6%. The second test is done by K-fold Cross Validation with K = 25 with the number of data 200, the data will be divided into two parts, namely the training data and test data, training data as many as 192 data and test data as many as 8 data. K-Fold Cross Validation method. This CBR system can produce an accuracy of 85.71%


2020 ◽  
Author(s):  
Masaya Kisohara ◽  
Yuto Masuda ◽  
Emi Yuda ◽  
Norihiro Ueda ◽  
Junichiro Hayano

Abstract Background Machine learning of R-R interval Lorenz plot (LP) images is a promising method for the detection of atrial fibrillation (AF) in long-term ECG monitoring, but the optimal length of R-R interval segment window for the LP images is unknown. We examined the performance of LP AF detection by differing the segment length using convolutional neural network (CNN). LP images with a 32 x 32-pixel resolution of non-overlapping R-R interval segments with lengths of 10, 20, 50, 100, 200, and 500 beats were created from 24-h ECG data in 52 patients with chronic AF and 58 non-AF controls as training data and in 53 patients with paroxysmal AF and 52 non-AF controls as test data. For each segment length, classification models were made by 5-fold cross-validation subsets of the training data and its classification performance was examined with the test data. Results In machine learning with the training data, the averages of cross-validation scores were 0.995 and 0.999 for 10 and 20-beat LP images, respectively, and >0.999 for 50 to 500-beat images. The classification of test data showed good performance for all segment lengths with an accuracy from 0.970 to 0.988. Positive likelihood ratio for detecting AF segments, however, showed a convex parabolic curve linear relationship to log segment length with a peak ratio of 111 at 100 beats, while negative likelihood ratio showed monotonous increase with increasing segment length. Conclusions This study suggests that the optimal R-R interval segment window length that maximizes the positive likelihood ratio for detecting paroxysmal AF with 32 x 32-pixel LP image is about 100 beats.


Sign in / Sign up

Export Citation Format

Share Document