scholarly journals Neighbour mapping as a method for ordering genetic markers

1997 ◽  
Vol 69 (1) ◽  
pp. 35-43 ◽  
Author(s):  
T. H. N. ELLIS

A modification of the neighbour joining method of Saitou & Nei (1987) is shown to be applicable to the ordering of genetic markers. This neighbour mapping method is compared with some other procedures for ordering genetic markers using both real and test data sets. The limitations and likely errors associated with the use of neighbour mapping are discussed. The speed and simplicity of this method commend its application, as does its concurrence with other mapping methods.

Author(s):  
Kristian Krabbenhoft ◽  
J. Wang

A new stress-strain relation capable of reproducing the entire stress-strain range of typical soil tests is presented. The new relation involves a total of five parameters, four of which can be inferred directly from typical test data. The fifth parameter is a fitting parameter with a relatively narrow range. The capabilities of the new relation is demonstrated by the application to various clay and sand data sets.


2021 ◽  
Author(s):  
David Cotton ◽  

<p><strong>Introduction</strong></p><p>HYDROCOASTAL is a two year project funded by ESA, with the objective to maximise exploitation of SAR and SARin altimeter measurements in the coastal zone and inland waters, by evaluating and implementing new approaches to process SAR and SARin data from CryoSat-2, and SAR altimeter data from Sentinel-3A and Sentinel-3B. Optical data from Sentinel-2 MSI and Sentinel-3 OLCI instruments will also be used in generating River Discharge products.</p><p>New SAR and SARin processing algorithms for the coastal zone and inland waters will be developed and implemented and evaluated through an initial Test Data Set for selected regions. From the results of this evaluation a processing scheme will be implemented to generate global coastal zone and river discharge data sets.</p><p>A series of case studies will assess these products in terms of their scientific impacts.</p><p>All the produced data sets will be available on request to external researchers, and full descriptions of the processing algorithms will be provided</p><p> </p><p><strong>Objectives</strong></p><p>The scientific objectives of HYDROCOASTAL are to enhance our understanding  of interactions between the inland water and coastal zone, between the coastal zone and the open ocean, and the small scale processes that govern these interactions. Also the project aims to improve our capability to characterize the variation at different time scales of inland water storage, exchanges with the ocean and the impact on regional sea-level changes</p><p>The technical objectives are to develop and evaluate  new SAR  and SARin altimetry processing techniques in support of the scientific objectives, including stack processing, and filtering, and retracking. Also an improved Wet Troposphere Correction will be developed and evaluated.</p><p><strong>Project  Outline</strong></p><p>There are four tasks to the project</p><ul><li>Scientific Review and Requirements Consolidation: Review the current state of the art in SAR and SARin altimeter data processing as applied to the coastal zone and to inland waters</li> <li>Implementation and Validation: New processing algorithms with be implemented to generate a Test Data sets, which will be validated against models, in-situ data, and other satellite data sets. Selected algorithms will then be used to generate global coastal zone and river discharge data sets</li> <li>Impacts Assessment: The impact of these global products will be assess in a series of Case Studies</li> <li>Outreach and Roadmap: Outreach material will be prepared and distributed to engage with the wider scientific community and provide recommendations for development of future missions and future research.</li> </ul><p> </p><p><strong>Presentation</strong></p><p>The presentation will provide an overview to the project, present the different SAR altimeter processing algorithms that are being evaluated in the first phase of the project, and early results from the evaluation of the initial test data set.</p><p> </p>


2021 ◽  
Vol 10 (1) ◽  
pp. 105
Author(s):  
I Gusti Ayu Purnami Indryaswari ◽  
Ida Bagus Made Mahendra

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.


Author(s):  
Chuang Sun ◽  
Zhousuo Zhang ◽  
Zhengjia He ◽  
Zhongjie Shen ◽  
Binqiang Chen ◽  
...  

Bearing performance degradation assessment is meaningful for keeping mechanical reliability and safety. For this purpose, a novel method based on kernel locality preserving projection is proposed in this article. Kernel locality preserving projection extends the traditional locality preserving projection into the non-linear form by using a kernel function and it is more appropriate to explore the non-linear information hidden in the data sets. Considering this point, the kernel locality preserving projection is used to generate a non-linear subspace from the normal bearing data. The test data are then projected onto the subspace to obtain an index for assessing bearing degradation degrees. The degradation index that is expressed in the form of inner product indicates similarity of the normal data and the test data. Validations by using monitoring data from two experiments show the effectiveness of the proposed method.


2021 ◽  
Vol 22 (Supplement_2) ◽  
Author(s):  
F Ghanbari ◽  
T Joyce ◽  
S Kozerke ◽  
AI Guaricci ◽  
PG Masci ◽  
...  

Abstract Funding Acknowledgements Type of funding sources: Other. Main funding source(s): J. Schwitter receives research support by “ Bayer Schweiz AG “. C.N.C. received grant by Siemens. Gianluca Pontone received institutional fees by General Electric, Bracco, Heartflow, Medtronic, and Bayer. U.J.S received grand by Astellas, Bayer, General Electric. This work was supported by Italian Ministry of Health, Rome, Italy (RC 2017 R659/17-CCM698). This work was supported by Gyrotools, Zurich, Switzerland. Background  Late Gadolinium enhancement (LGE) scar quantification is generally recognized as an accurate and reproducible technique, but it is observer-dependent and time consuming. Machine learning (ML) potentially offers to solve this problem.  Purpose  to develop and validate a ML-algorithm to allow for scar quantification thereby fully avoiding observer variability, and to apply this algorithm to the prospective international multicentre Derivate cohort. Method  The Derivate Registry collected heart failure patients with LV ejection fraction <50% in 20 European and US centres. In the post-myocardial infarction patients (n = 689) quality of the LGE short-axis breath-hold images was determined (good, acceptable, sufficient, borderline, poor, excluded) and ground truth (GT) was produced (endo-epicardial contours, 2 remote reference regions, artefact elimination) to determine mass of non-infarcted myocardium and of dense (≥5SD above mean-remote) and non-dense scar (>2SD to <5SD above mean-remote). Data were divided into the learning (total n = 573; training: n = 289; testing: n = 284) and validation set (n = 116). A Ternaus-network (loss function = average of dice and binary-cross-entropy) produced 4 outputs (initial prediction, test time augmentation (TTA), threshold-based prediction (TB), and TTA + TB) representing normal myocardium, non-dense, and dense scar (Figure 1).Outputs were evaluated by dice metrics, Bland-Altman, and correlations.  Results  In the validation and test data sets, both not used for training, the dense scar GT was 20.8 ± 9.6% and 21.9 ± 13.3% of LV mass, respectively. The TTA-network yielded the best results with small biases vs GT (-2.2 ± 6.1%, p < 0.02; -1.7 ± 6.0%, p < 0.003, respectively) and 95%CI vs GT in the range of inter-human comparisons, i.e. TTA yielded SD of the differences vs GT in the validation and test data of 6.1 and 6.0 percentage points (%p), respectively (Fig 2), which was comparable to the 7.7%p for the inter-observer comparison (n = 40). For non-dense scar, TTA performance was similar with small biases (-1.9 ± 8.6%, p < 0.0005, -1.4 ± 8.2%, p < 0.0001, in the validation and test sets, respectively, GT 39.2 ± 13.8% and 42.1 ± 14.2%) and acceptable 95%CI with SD of the differences of 8.6 and 8.2%p for TTA vs GT, respectively, and 9.3%p for inter-observer.  Conclusions  In the large Derivate cohort from 20 centres, performance of the presented ML-algorithm to quantify dense and non-dense scar fully automatically is comparable to that of experienced humans with small bias and acceptable 95%-CI. Such a tool could facilitate scar quantification in clinical routine as it eliminates human observer variability and can handle large data sets.


2021 ◽  
Author(s):  
Louise Bloch ◽  
Christoph M. Friedrich

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.


2021 ◽  
Author(s):  
Xingang Jia ◽  
Qiuhong Han ◽  
Zuhong Lu

Abstract Background: Phages are the most abundant biological entities, but the commonly used clustering techniques are difficult to separate them from other virus families and classify the different phage families together.Results: This work uses GI-clusters to separate phages from other virus families and classify the different phage families, where GI-clusters are constructed by GI-features, GI-features are constructed by the togetherness with F-features, training data, MG-Euclidean and Icc-cluster algorithms, F-features are the frequencies of multiple-nucleotides that are generated from genomes of viruses, MG-Euclidean algorithm is able to put the nearest neighbors in the same mini-groups, and Icc-cluster algorithm put the distant samples to the different mini-clusters. For these viruses that the maximum element of their GI-features are in the same locations, they are put to the same GI-clusters, where the families of viruses in test data are identified by GI-clusters, and the families of GI-clusters are defined by viruses of training data.Conclusions: From analysis of 4 data sets that are constructed by the different family viruses, we demonstrate that GI-clusters are able to separate phages from other virus families, correctly classify the different phage families, and correctly predict the families of these unknown phages also.


Author(s):  
Andrew Chisholm ◽  
Ben Hachey

Entity disambiguation with Wikipedia relies on structured information from redirect pages, article text, inter-article links, and categories. We explore whether web links can replace a curated encyclopaedia, obtaining entity prior, name, context, and coherence models from a corpus of web pages with links to Wikipedia. Experiments compare web link models to Wikipedia models on well-known conll and tac data sets. Results show that using 34 million web links approaches Wikipedia performance. Combining web link and Wikipedia models produces the best-known disambiguation accuracy of 88.7 on standard newswire test data.


Sign in / Sign up

Export Citation Format

Share Document