Interval modelling in optimization of k‐NN classifiers for large number of attributes in data sets on an example of DNA microarrays

Author(s):  
Urszula Bentkowska ◽  
Jan G. Bazan ◽  
Lech Zarȩba ◽  
Jerzy Socha ◽  
Stanislawa Bazan‐Socha ◽  
...  
Keyword(s):  
2004 ◽  
Vol 17 (2) ◽  
pp. 140-149 ◽  
Author(s):  
Julian L. Griffin ◽  
Stephanie A. Bonney ◽  
Chris Mann ◽  
Abdul M. Hebbachi ◽  
Geoff F. Gibbons ◽  
...  

In functional genomics, DNA microarrays for gene expression profiling are increasingly being used to provide insights into biological function or pathology. To better understand the significance of the multiple transcriptional changes across a time period, the temporal changes in phenotype must be described. Orotic acid-induced fatty liver disease was investigated at the transcriptional and metabolic levels using microarrays and metabolic profiling in two strains of rats. High-resolution 1H-NMR spectroscopic analysis of liver tissue indicated that Kyoto rats compared with Wistar rats are predisposed to the insult. Metabolite analysis and gene expression profiling following orotic acid treatment identified perturbed metabolic pathways, including those involved in fatty acid, triglyceride, and phospholipid synthesis, β-oxidation, altered nucleotide, methyl donor, and carbohydrate metabolism, and stress responses. Multivariate analysis and statistical bootstrapping were used to investigate co-responses with transcripts involved in metabolism and stress responses. This reverse functional genomic strategy highlighted the relationship between changes in the transcription of stearoyl-CoA desaturase 1 and those of other lipid-related transcripts with changes in NMR-derived lipid profiles. The results suggest that the integration of 1H-NMR and gene expression data sets represents a robust method for identifying a focused line of research in a complex system.


1998 ◽  
Vol 9 (12) ◽  
pp. 3273-3297 ◽  
Author(s):  
Paul T. Spellman ◽  
Gavin Sherlock ◽  
Michael Q. Zhang ◽  
Vishwanath R. Iyer ◽  
Kirk Anders ◽  
...  

We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures synchronized by three independent methods: α factor arrest, elutriation, and arrest of a cdc15 temperature-sensitive mutant. Using periodicity and correlation algorithms, we identified 800 genes that meet an objective minimum criterion for cell cycle regulation. In separate experiments, designed to examine the effects of inducing either the G1 cyclin Cln3p or the B-type cyclin Clb2p, we found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins. Furthermore, we analyzed our set of cell cycle–regulated genes for known and new promoter elements and show that several known elements (or variations thereof) contain information predictive of cell cycle regulation. A full description and complete data sets are available at http://cellcycle-www.stanford.edu


Author(s):  
Wenyuan Li

With the rapid growth of the World Wide Web and the capacity of digital data storage, tremendous amount of data are generated daily from business and engineering to the Internet and science. The Internet, financial real-time data, hyperspectral imagery, and DNA microarrays are just a few of the common sources that feed torrential streams of data into scientific and business databases worldwide. Compared to statistical data sets with small size and low dimensionality, traditional clustering techniques are challenged by such unprecedented high volume, high dimensionality complex data. To meet these challenges, many new clustering algorithms have been proposed in the area of data mining (Han & Kambr, 2001).


Blood ◽  
2003 ◽  
Vol 101 (6) ◽  
pp. 2307-2313 ◽  
Author(s):  
Carl-Magnus Högerkorp ◽  
Sven Bilke ◽  
Thomas Breslin ◽  
Sigurdur Ingvarsson ◽  
Carl A. K. Borrebaeck

A number of studies have implicated a role for the cell surface glycoprotein CD44 in several biologic events, such as lymphopoiesis, homing, lymphocyte activation, and apoptosis. We have earlier reported that signaling via CD44 on naive B cells in addition to B-cell receptor (BCR) and CD40 engagement generated a germinal center–like phenotype. To further characterize the global role of CD44 in B differentiation, we examined the expression profile of human B cells cultured in vitro in the presence or absence of CD44 ligation, together with anti-immunoglobulin (anti-Ig) and anti-CD40 antibodies. The data sets derived from DNA microarrays were analyzed using a novel statistical analysis scheme created to retrieve the most likely expression pattern of CD44 ligation. Our results show that genes such as interleukin-6 (IL-6), IL-1α, and β2-adrenergic receptor (β2-AR) were specifically up-regulated by CD44 ligation, suggesting a novel role for CD44 in immunoregulation and inflammation.


Author(s):  
Т.В. Речкалов ◽  
М.Л. Цымблер

Алгоритм PAM (Partitioning Around Medoids) представляет собой разделительный алгоритм кластеризации, в котором в качестве центров кластеров выбираются только кластеризуемые объекты (медоиды). Кластеризация на основе техники медоидов применяется в широком спектре приложений: сегментирование медицинских и спутниковых изображений, анализ ДНК-микрочипов и текстов и др. На сегодня имеются параллельные реализации PAM для систем GPU и FPGA, но отсутствуют таковые для многоядерных ускорителей архитектуры Intel Many Integrated Core (MIC). В настоящей статье предлагается новый параллельный алгоритм кластеризации PhiPAM для ускорителей Intel MIC. Вычисления распараллеливаются с помощью технологии OpenMP. Алгоритм предполагает использование специализированной компоновки данных в памяти и техники тайлинга, позволяющих эффективно векторизовать вычисления на системах Intel MIC. Эксперименты, проведенные на реальных наборах данных, показали хорошую масштабируемость алгоритма. The PAM (Partitioning Around Medoids) is a partitioning clustering algorithm where each cluster is represented by an object from the input dataset (called a medoid). The medoid-based clustering is used in a wide range of applications: the segmentation of medical and satellite images, the analysis of DNA microarrays and texts, etc. Currently, there are parallel implementations of PAM for GPU and FPGA systems, but not for Intel Many Integrated Core (MIC) accelerators. In this paper, we propose a novel parallel PhiPAM clustering algorithm for Intel MIC systems. Computations are parallelized by the OpenMP technology. The algorithm exploits a sophisticated memory data layout and loop tiling technique, which allows one to efficiently vectorize computations with Intel MIC. Experiments performed on real data sets show a good scalability of the algorithm.


2004 ◽  
Vol 186 (1) ◽  
pp. 164-178 ◽  
Author(s):  
Hongbin Liu ◽  
Nicholas H. Bergman ◽  
Brendan Thomason ◽  
Shamira Shallom ◽  
Alyson Hazen ◽  
...  

ABSTRACT The endospores of Bacillus anthracis are the infectious particles of anthrax. Spores are dormant bacterial morphotypes able to withstand harsh environments for decades, which contributes to their ability to be formulated and dispersed as a biological weapon. We monitored gene expression in B. anthracis during growth and sporulation using full genome DNA microarrays and matched the results against a comprehensive analysis of the mature anthrax spore proteome. A large portion (∼36%) of the B. anthracis genome is regulated in a growth phase-dependent manner, and this regulation is marked by five distinct waves of gene expression as cells proceed from exponential growth through sporulation. The identities of more than 750 proteins present in the spore were determined by multidimensional chromatography and tandem mass spectrometry. Comparison of data sets revealed that while the genes responsible for assembly and maturation of the spore are tightly regulated in discrete stages, many of the components ultimately found in the spore are expressed throughout and even before sporulation, suggesting that gene expression during sporulation may be mainly related to the physical construction of the spore, rather than synthesis of eventual spore content. The spore also contains an assortment of specialized, but not obviously related, metabolic and protective proteins. These findings contribute to our understanding of spore formation and function and will be useful in the detection, prevention, and early treatment of anthrax. This study also highlights the complementary nature of genomic and proteomic analyses and the benefits of combining these approaches in a single study.


2005 ◽  
pp. 75-101
Author(s):  
She-Pin Hung ◽  
Suman Sundaresh ◽  
Pierre F. Baldi ◽  
G. Wesley Hatfield

Blood ◽  
2004 ◽  
Vol 104 (4) ◽  
pp. 923-932 ◽  
Author(s):  
Benjamin L. Ebert ◽  
Todd R. Golub

AbstractIn the past several years, experiments using DNA microarrays have contributed to an increasingly refined molecular taxonomy of hematologic malignancies. In addition to the characterization of molecular profiles for known diagnostic classifications, studies have defined patterns of gene expression corresponding to specific molecular abnormalities, oncologic phenotypes, and clinical outcomes. Furthermore, novel subclasses with distinct molecular profiles and clinical behaviors have been identified. In some cases, specific cellular pathways have been highlighted that can be therapeutically targeted. The findings of microarray studies are beginning to enter clinical practice as novel diagnostic tests, and clinical trials are ongoing in which therapeutic agents are being used to target pathways that were identified by gene expression profiling. While the technology of DNA microarrays is becoming well established, genome-wide surveys of gene expression generate large data sets that can easily lead to spurious conclusions. Many challenges remain in the statistical interpretation of gene expression data and the biologic validation of findings. As data accumulate and analyses become more sophisticated, genomic technologies offer the potential to generate increasingly sophisticated insights into the complex molecular circuitry of hematologic malignancies. This review summarizes the current state of discovery and addresses key areas for future research.


Author(s):  
John A. Hunt

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].


Author(s):  
Mark Ellisman ◽  
Maryann Martone ◽  
Gabriel Soto ◽  
Eleizer Masliah ◽  
David Hessler ◽  
...  

Structurally-oriented biologists examine cells, tissues, organelles and macromolecules in order to gain insight into cellular and molecular physiology by relating structure to function. The understanding of these structures can be greatly enhanced by the use of techniques for the visualization and quantitative analysis of three-dimensional structure. Three projects from current research activities will be presented in order to illustrate both the present capabilities of computer aided techniques as well as their limitations and future possibilities.The first project concerns the three-dimensional reconstruction of the neuritic plaques found in the brains of patients with Alzheimer's disease. We have developed a software package “Synu” for investigation of 3D data sets which has been used in conjunction with laser confocal light microscopy to study the structure of the neuritic plaque. Tissue sections of autopsy samples from patients with Alzheimer's disease were double-labeled for tau, a cytoskeletal marker for abnormal neurites, and synaptophysin, a marker of presynaptic terminals.


Sign in / Sign up

Export Citation Format

Share Document