The impact of pre-clustering on classification of heterogeneous protein data

Author(s):  
Haneen Altartouri ◽  
Hashem Tamimi ◽  
Yaqoub Ashhab
Keyword(s):  
2019 ◽  
pp. 27-35
Author(s):  
Alexandr Neznamov

Digital technologies are no longer the future but are the present of civil proceedings. That is why any research in this direction seems to be relevant. At the same time, some of the fundamental problems remain unattended by the scientific community. One of these problems is the problem of classification of digital technologies in civil proceedings. On the basis of instrumental and genetic approaches to the understanding of digital technologies, it is concluded that their most significant feature is the ability to mediate the interaction of participants in legal proceedings with information; their differentiating feature is the function performed by a particular technology in the interaction with information. On this basis, it is proposed to distinguish the following groups of digital technologies in civil proceedings: a) technologies of recording, storing and displaying (reproducing) information, b) technologies of transferring information, c) technologies of processing information. A brief description is given to each of the groups. Presented classification could serve as a basis for a more systematic discussion of the impact of digital technologies on the essence of civil proceedings. Particularly, it is pointed out that issues of recording, storing, reproducing and transferring information are traditionally more «technological» for civil process, while issues of information processing are more conceptual.


2018 ◽  
Vol 35 (4) ◽  
pp. 133-136
Author(s):  
R. N. Ibragimov

The article examines the impact of internal and external risks on the stability of the financial system of the Altai Territory. Classification of internal and external risks of decline, affecting the sustainable development of the financial system, is presented. A risk management strategy is proposed that will allow monitoring of risks, thereby these measures will help reduce the loss of financial stability and ensure the long-term development of the economy of the region.


Author(s):  
Derek Burton ◽  
Margaret Burton

Fish diversity is considered in terms of variety of their morphological, taxonomic, habitat and population attributes. Fish, with over 30, 000 current species, represent the largest group of vertebrates. The complexity of classification of a group of this size and antiquity, together with recognition of additional species, demands continuous ongoing revision. The impact of the recent fundamental changes in fish classification in 2016 is discussed. Life in water involves adaptations to widely different habitats which can result in physiological morphological and life-style variations which are reviewed.


Author(s):  
Victor L. Shabanov ◽  
Marianna Ya Vasilchenko ◽  
Elena A. Derunova ◽  
Andrey P. Potapov

The aim of the work is to find relevant indicators for assessing the relationship between investments in fixed assets in agriculture, gross output of the industry, and agricultural exports using tools for modeling the impact of innovation and investment development on increasing production and export potential in the context of the formation of an export-oriented agricultural economy. The modeling methodology and the proposed estimating and forecasting tools for diagnosing and monitoring the state of sectoral and regional innovative agricultural systems are used to analyze the relationship between investments in fixed assets in agriculture, gross output of the industry, and agricultural exports based on the construction of the classification of Russian regions by factors that aggregate these features to diagnose incongruence problems and to improve institutional management in regional innovative export-oriented agrosystems. Based on the results of the factor analysis application, an underestimated role of indicators of investment in agriculture, the intensity and efficiency of agricultural production, were established. Based on the results of the cluster analysis, the established five groups of regions were identified, with significant differences in the level of investment in agriculture, the volume of production of the main types of agricultural products, and the export and exported food. The research results are of practical value for use in improving institutional management when planning reforms and transformations of regional innovative agrosystems.


2021 ◽  
Vol 11 (1) ◽  
pp. 9
Author(s):  
Fernando Leonel Aguirre ◽  
Nicolás M. Gomez ◽  
Sebastián Matías Pazos ◽  
Félix Palumbo ◽  
Jordi Suñé ◽  
...  

In this paper, we extend the application of the Quasi-Static Memdiode model to the realistic SPICE simulation of memristor-based single (SLPs) and multilayer perceptrons (MLPs) intended for large dataset pattern recognition. By considering ex-situ training and the classification of the hand-written characters of the MNIST database, we evaluate the degradation of the inference accuracy due to the interconnection resistances for MLPs involving up to three hidden neural layers. Two approaches to reduce the impact of the line resistance are considered and implemented in our simulations, they are the inclusion of an iterative calibration algorithm and the partitioning of the synaptic layers into smaller blocks. The obtained results indicate that MLPs are more sensitive to the line resistance effect than SLPs and that partitioning is the most effective way to minimize the impact of high line resistance values.


2021 ◽  
Vol 11 (2) ◽  
pp. 535
Author(s):  
Mahbubunnabi Tamal

Quantification and classification of heterogeneous radiotracer uptake in Positron Emission Tomography (PET) using textural features (termed as radiomics) and artificial intelligence (AI) has the potential to be used as a biomarker of diagnosis and prognosis. However, textural features have been predicted to be strongly correlated with volume, segmentation and quantization, while the impact of image contrast and noise has not been assessed systematically. Further continuous investigations are required to update the existing standardization initiatives. This study aimed to investigate the relationships between textural features and these factors with 18F filled torso NEMA phantom to yield different contrasts and reconstructed with different durations to represent varying levels of noise. The phantom was also scanned with heterogeneous spherical inserts fabricated with 3D printing technology. All spheres were delineated using: (1) the exact boundaries based on their known diameters; (2) 40% fixed; and (3) adaptive threshold. Six textural features were derived from the gray level co-occurrence matrix (GLCM) using different quantization levels. The results indicate that homogeneity and dissimilarity are the most suitable for measuring PET tumor heterogeneity with quantization 64 provided that the segmentation method is robust to noise and contrast variations. To use these textural features as prognostic biomarkers, changes in textural features between baseline and treatment scans should always be reported along with the changes in volumes.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


Author(s):  
K Sooknunan ◽  
M Lochner ◽  
Bruce A Bassett ◽  
H V Peiris ◽  
R Fender ◽  
...  

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Matteo Pellegrini

AbstractThis paper provides a fully word-based, abstractive analysis of predictability in Latin verb paradigms. After reviewing previous traditional and theoretically grounded accounts of Latin verb inflection, a procedure is outlined where the uncertainty in guessing the content of paradigm cells given knowledge of one or more inflected wordforms is measured by means of the information-theoretic notions of unary and n-ary implicative entropy, respectively, in a quantitative approach that uses the type frequency of alternation patterns between wordforms as an estimate of their probability of application. Entropy computations are performed by using the Qumin toolkit on data taken from the inflected lexicon LatInfLexi. Unary entropy values are used to draw a mapping of the verbal paradigm in zones of full interpredictability, composed of cells that can be inferred from one another with no uncertainty. N-ary entropy values are used to extract categorical and near principal part sets, that allow to fill the rest of the paradigm with little or no uncertainty. Lastly, the issue of the impact of information on the derivational relatedness of lexemes on uncertainty in inflectional predictions is tackled, showing that adding a classification of verbs in derivational families allows for a relevant reduction of entropy, not only for derived verbs, but also for simple ones.


Sign in / Sign up

Export Citation Format

Share Document