minimal subset
Recently Published Documents


TOTAL DOCUMENTS

48
(FIVE YEARS 15)

H-INDEX

9
(FIVE YEARS 2)

Animals ◽  
2022 ◽  
Vol 12 (2) ◽  
pp. 201
Author(s):  
Maoxuan Miao ◽  
Jinran Wu ◽  
Fengjing Cai ◽  
You-Gan Wang

Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.


Cancers ◽  
2021 ◽  
Vol 13 (22) ◽  
pp. 5624
Author(s):  
Matthis Desoteux ◽  
Corentin Louis ◽  
Kevin Bévant ◽  
Denise Glaise ◽  
Cédric Coulouarn

Hepatocellular carcinoma (HCC) is a deadly cancer worldwide as a result of a frequent late diagnosis which limits the therapeutic options. Tumor progression in HCC is closely correlated with the dedifferentiation of hepatocytes, the main parenchymal cells in the liver. Here, we hypothesized that the expression level of genes reflecting the differentiation status of tumor hepatocytes could be clinically relevant in defining subsets of patients with different clinical outcomes. To test this hypothesis, an integrative transcriptomics approach was used to stratify a cohort of 139 HCC patients based on a gene expression signature established in vitro in the HepaRG cell line using well-controlled culture conditions recapitulating tumor hepatocyte differentiation. The HepaRG model was first validated by identifying a robust gene expression signature associated with hepatocyte differentiation and liver metabolism. In addition, the signature was able to distinguish specific developmental stages in mice. More importantly, the signature identified a subset of human HCC associated with a poor prognosis and cancer stem cell features. By using an independent HCC dataset (TCGA consortium), a minimal subset of seven differentiation-related genes was shown to predict a reduced overall survival, not only in patients with HCC but also in other types of cancers (e.g., kidney, pancreas, skin). In conclusion, the study identified a minimal subset of seven genes reflecting the differentiation status of tumor hepatocytes and clinically relevant for predicting the prognosis of HCC patients.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1906
Author(s):  
Tahani Nawaf Alawneh ◽  
Mehmet Ali Tut

Data pre-processing is a major difficulty in the knowledge discovery process, especially feature selection on a large amount of data. In literature, various approaches have been suggested to overcome this difficulty. Unlike most approaches, Rough Set Theory (RST) can discover data de-pendency and reduce the attributes without the need for further information. In RST, the discernibility matrix is the mathematical foundation for computing such reducts. Although it proved its efficiency in feature selection, unfortunately it is computationally expensive on high dimensional data. Algorithm complexity is related to the search of the minimal subset of attributes, which requires computing an exponential number of possible subsets. To overcome this limitation, many RST enhancements have been proposed. Contrary to recent methods, this paper implements RST concepts in an iterated manner using R language. First, the dataset was partitioned into a smaller number of subsets and each subset processed independently to generate its own minimal attribute set. Within the iterations, only minimal elements in the discernibility matrix were considered. Finally, the iterated outputs were compared, and those common among all reducts formed the minimal one (Core attributes). A comparison with another novel proposed algorithm using three benchmark datasets was performed. The proposed approach showed its efficiency in calculating the same minimal attribute sets with less execution time.


Animals ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 2621
Author(s):  
Federico Armando ◽  
Claudio Pigoli ◽  
Matteo Gambini ◽  
Andrea Ghidelli ◽  
Gabriele Ghisleni ◽  
...  

Skin spindle cell tumors (SSTs) frequently occur in fishes, with peripheral nerve sheath tumors (PNSTs) being the most commonly reported neoplasms in goldfish. However, distinguishing PNSTs from other SCTs is not always possible when relying exclusively on routine cytological and histopathological findings. Therefore, the aim of this study is to characterize six skin nodules, resembling atypical neurofibromas in humans, found in six cohabiting goldfish (Carassius auratus), and to determine a minimal subset of special stains required to correctly identify PNSTs in this species. Routine cytology and histopathology were indicative of an SCT with nuclear atypia in all cases, with randomly distributed areas of hypercellularity and loss of neurofibroma architecture. Muscular and fibroblastic tumors were excluded using Azan trichrome staining. Alcian blue and Gomori’s reticulin stains revealed the presence of intratumoral areas of glycosaminoglycans or mucins and basement membrane fragments, respectively. PAS and PAS–diastase stains confirmed the latter finding and revealed intra- and extracellular glycogen granules. Immunohistochemistry displayed multifocal, randomly distributed aggregates of neoplastic cells positive for S100 protein and CNPase, intermingled with phosphorylated and non-phosphorylated neurofilament-positive axons. Collectively, these findings are consistent with a PNST resembling atypical neurofibroma in humans, an entity not previously reported in goldfish, and suggest that Azan trichrome staining, reticulin staining, and immunohistochemistry for S100 protein and CNPase represent a useful set of special stains to identify and characterize PNSTs in this species.


Author(s):  
Tong Zou ◽  
Tianyu Pan ◽  
Michael Taylor ◽  
Hal Stern

AbstractRecognition of overlapping objects is required in many applications in the field of computer vision. Examples include cell segmentation, bubble detection and bloodstain pattern analysis. This paper presents a method to identify overlapping objects by approximating them with ellipses. The method is intended to be applied to complex-shaped regions which are believed to be composed of one or more overlapping objects. The method has two primary steps. First, a pool of candidate ellipses are generated by applying the Euclidean distance transform on a compressed image and the pool is filtered by an overlaying method. Second, the concave points on the contour of the region of interest are extracted by polygon approximation to divide the contour into segments. Then, the optimal ellipses are selected from among the candidates by choosing a minimal subset that best fits the identified segments. We propose the use of the adjusted Rand index, commonly applied in clustering, to compare the fitting result with ground truth. Through a set of computational and optimization efficiencies, we are able to apply our approach in complex images comprised of a number of overlapped regions. Experimental results on a synthetic data set, two types of cell images and bloodstain patterns show superior accuracy and flexibility of our method in ellipse recognition, relative to other methods.


2021 ◽  
Author(s):  
Olivia Swanson ◽  
Brianna Rhodes ◽  
Avivah Wang ◽  
Shi-Mao Xia ◽  
Robert Parks ◽  
...  

SummaryElicitation of broadly neutralizing antibodies (bnAbs) by an HIV vaccine will involve priming the immune system to activate antibody precursors, followed by boosting immunizations to select for antibodies with functional features required for neutralization breadth. The higher the number of mutations necessary for function, the more convoluted are the antibody developmental pathways. HIV bnAbs acquire a large number of somatic mutations, but not all mutations are functionally important. Here we identified a minimal subset of mutations sufficient for the function of the V3-glycan bnAb DH270.6. Using antibody library screening, candidate envelope immunogens that interacted with DH270.6-like antibodies containing this set of key mutations were identified and selected in vitro. Our results demonstrate that less complex B cell evolutionary pathways than those naturally observed exist for the induction of HIV bnAbs by vaccination, and establish rational approaches to identify boosting sequential envelope candidate immunogens.


2020 ◽  
Author(s):  
Tamar Amitai ◽  
Yoav Kan-Tor ◽  
Naama Srebnik ◽  
Amnon Buxboim

ABSTRACTObjectiveDevelop a machine learning classifier for predicting the risk of cleavage-stage embryos to undergo first trimester miscarriage based on time-lapse images of preimplantation development.DesignRetrospective study of a 4-year multi-center cohort of women undergoing intra-cytoplasmatic sperm injection (ICSI). The study included embryos with positive indication of clinical implantation based on gestational sac visualization either with first trimester miscarriage or live birth outcome. Miscarriage was determined based on negative fetal heartbeat indication during the first trimester.SettingData were recorded and obtained in hospital setting and research was performed in university setting.Patient(s)Data from 391 women who underwent fresh single or double embryo transfers were included.Intervention(s)None.Main Outcome Measure(s)A minimal subset of six non-redundant morphodynamic features were screened that maintain high prediction capacity. Using this feature subset, XGBoost and Random Forest models were trained following a 100-fold Monte-Carlo cross validation scheme. Feature importance was scored using the SHapley Additive exPlanations (SHAP) methodology. Miscarriage versus live-birth outcome prediction was evaluated using a non-contaminated balanced test set and quantified in terms of the area under the receiver operating characteristic (ROC) curve (AUC), precision-recall curve, positive predictive value (PPV), and confusion matrices.Result(s)Features that account for the distribution of the nucleolus precursor bodies within the small pronucleus and pronuclei dynamics were highly predictive of miscarriage outcome. AUC for miscarriage prediction of validation and test set embryos using both models was 0.68-to-0.69. Clinical utility was tested by setting two classification thresholds accounting for high sensitivity 0.73 with 0.6 specificity and high specificity 0.93 with 0.33 sensitivity.Conclusion(s)We report the development of a decision-support tool for identifying the embryos with high risk of miscarriage. Prioritizing embryos for transfer based on their predicted risk of miscarriage in combination with their predicted implantation potential will improve live-birth rates and shorten time-to-pregnancy.CapsuleThe risk of first trimester miscarriage of cleavage stage embryos is predicted with AUC 68% by screening a minimal subset of six non-redundant morpho-dynamic features and training a machine-learning classifier.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Luis Alfredo Moctezuma ◽  
Marta Molinas

Abstract We present a new approach for a biometric system based on electroencephalographic (EEG) signals of resting-state, that can identify a subject and reject intruders with a minimal subset of EEG channels. To select features, we first use the discrete wavelet transform (DWT) or empirical mode decomposition (EMD) to decompose the EEG signals into a set of sub-bands, for which we compute the instantaneous and Teager energy and the Higuchi and Petrosian fractal dimensions for each sub-band. The obtained features are used as input for the local outlier factor (LOF) algorithm to create a model for each subject, with the aim of learning from it and rejecting instances not related to the subject in the model. In search of a minimal subset of EEG channels, we used a channel-selection method based on the non-dominated sorting genetic algorithm (NSGA)-III, designed with the objectives of minimizing the required number EEG channels and increasing the true acceptance rate (TAR) and true rejection rate (TRR). This method was tested on EEG signals from 109 subjects of the public motor movement/imagery dataset (EEGMMIDB) using the resting-state with the eyes-open and the resting-state with the eyes-closed. We were able to obtain a TAR of $$1.000 \pm 0.000$$ 1.000 ± 0.000 and TRR of $$0.998 \pm 0.001$$ 0.998 ± 0.001 using 64 EEG channels. More importantly, with only three channels, we were able to obtain a TAR of up to $$0.993 \pm 0.01$$ 0.993 ± 0.01 and a TRR of up to $$0.941 \pm 0.002$$ 0.941 ± 0.002 for the Pareto-front, using NSGA-III and DWT-based features in the resting-state with the eyes-open. In the resting-state with the eyes-closed, the TAR was $$0.997 \pm 0.02$$ 0.997 ± 0.02 and the TRR $$0.950 \pm 0.05,$$ 0.950 ± 0.05 , also using DWT-based features from three channels. These results show that our approach makes it possible to create a model for each subject using EEG signals from a reduced number of channels and reject most instances of the other 108 subjects, who are intruders in the model of the subject under evaluation. Furthermore, the candidates obtained throughout the optimization process of NSGA-III showed that it is possible to obtain TARs and TRRs above 0.900 using LOF and DWT- or EMD-based features with only one to three EEG channels, opening the way to testing this approach on bigger datasets to develop a more realistic and usable EEG-based biometric system.


2020 ◽  
Vol 19 ◽  

Test Suite Minimization problem is a nondeterministic polynomial time (NP) complete problem insoftware engineering that has a special importance in software testing. In this problem, a subset with a minimalsize that contains a number of test cases that cover all the test requirements should be found. A brute­forceapproach to solving this problem is to assume a size for the minimal subset and then search to find if there is asubset of test cases with the assumed size that solves the problem. If not, the assumed minimal size is graduallyincremented, and the search is repeated. In this paper, a quantum­inspired genetic algorithm (QIGA) will beproposed to solve this problem. In it, quantum superposition, quantum rotation and quantum measurement willbe used in an evolutionary algorithm. The paper will show that the adopted quantum techniques can speed upthe convergence of the classical genetic algorithm. The proposed method has an advantage in that it reduces theassumed minimal number of test cases using quantum measurements, which makes it able to discover the minimalnumber of test cases without any prior assumptions.


2019 ◽  
Vol 66 ◽  
Author(s):  
Orgad Keller ◽  
Avinatan Hassidim ◽  
Noam Hazon

The classic Bribery problem is to find a minimal subset of voters who need to change their vote to make some preferred candidate win. Its important generalizations consider voters who are weighted and also have different prices. We provide an approximate solution for these problems for a broad family of scoring rules (which includes Borda and t-approval), in the following sense: for constant weights and prices, if there exists a strategy which costs k, we efficiently find a strategy which costs at most k+\widetilde{O}(sqrt(k)). An extension for non-constant weights and prices is also given. Our algorithm is based on a randomized reduction from these Bribery generalizations to weighted coalitional manipulation (WCM). To solve this WCM instance, we apply the Birkhoff-von Neumann (BvN) decomposition to a fractional manipulation matrix. This allows us to limit the size of the possible ballot search space reducing it from exponential to polynomial, while still obtaining good approximation guarantees.  Finding a solution in the truncated search space yields a new algorithm for WCM, which is of independent interest.


Sign in / Sign up

Export Citation Format

Share Document