Data Mining the CoBOP Data Base to Separate Bottom Albedo from Bottom Roughness and Illumination Effects

2002 ◽  
Author(s):  
Kendall L. Carder ◽  
David K. Costello
Keyword(s):  
Author(s):  
Tim Pychynski ◽  
Klaus Dullenkopf ◽  
Hans-Jo¨rg Bauer ◽  
Ralf Mikut

This paper presents a data-based method to predict the discharge coefficients of labyrinth seals. At first, leakage flow rate data for straight-through and stepped labyrinth seals from various sources was collected and fused in one consistent data base. In total, over 15000 data points have been collected so far covering a 25-dimensional design space. Secondly, this leakage data set was analysed using open-source Data Mining software, which provides several algorithms such as Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN). The suitability of MLR and ANN for modelling labyrinth discharge coefficients and analysing system sensitivity was tested and evaluated. The developed leakage models showed promising prediction qualities within the design space covered by data. Further improvements of model quality may be achieved by continuing data analysis using advanced methods of Data Mining and enlarging the existing data base. The major advantages of the presented method over numerical or analytical models are possible automation of the modelling process, low calculation efforts and high model qualities.


2020 ◽  
Author(s):  
Stefan Bracke ◽  
Alicia Puls

In December 2019, the world was confronted with the outbreak of the respiratory disease COVID-19 (Corona). The first infection (confirmed case) was detected in the City Wuhan, Hubei, China. First, it was an epidemic in China, but in the first quarter of 2020, it evolved into a pandemic, which continues to this day. The COVID-19 pandemic with its incredible speed of spread shows the vulnerability of a globalized and networked world. The first months of the pandemic were characterized by heavy burden on health systems. Worldwide, the population of countries was affected with severe restrictions, like educational system shutdown, public traffic system breakdown or a comprehensive lockdown. The severity of the burden was dependent on many factors, e.g. government, culture or health system. However, the burden happened regarding each country with slight time lags, cf. Bracke et al. (2020). This paper focuses on data analytics regarding infection data of the COVID-19 pandemic. It is a continuation of the research study COVID-19 pandemic data analytics: Data heterogeneity, spreading behavior, and lockdown impact, published by Bracke et al. (2020). The goal of this assessment is the evaluation/analysis of infection data mining considering model uncertainty, pandemic spreading behavior with lockdown impact and early second wave in Germany, Italy, Japan, New Zealand and France. Furthermore, a comparison with other infectious diseases (measles and influenza) is made. The used data base from Johns Hopkins University (JHU) runs from 01/22/2020 until 09/22/2020 with daily data, the dynamic development after 09/22/2020 is not considered. The measles/influenza analytics are based on Robert Koch Institute (RKI) data base 09/22/2020. Statistical models and methods from reliability engineering like Weibull distribution model or trend test are used to analyze the occurrence of infection.


Author(s):  
Erik Braun ◽  
Klaus Dullenkopf ◽  
Hans-Jörg Bauer

Numerous experimental and numerical studies were performed in the past by various authors to reduce the leakage of labyrinth seals and thus increase the performance of turbo machines. Based on the experience of more than 20 years of research activities in this area at the ITS, the authors aim to improve the prediction quality for labyrinth seal performance by combining experimental, numerical and data mining methods. Special emphasis in this work lies on more complex and also worn labyrinth geometries and thus on a more universal optimization tool for labyrinth seals incorporating more realistic engine running conditions as well as wear mechanisms. Better understanding of labyrinth seal behavior based on the new correlations and models will thus lead to optimized geometries and improved designs. The paper contains the results of experiments to determine the discharge coefficients of different straight-through labyrinth seals with three and five fins and two different fin geometries over a large range of pressure ratios as well as results from a stepped labyrinth seal with six fins in convergent and divergent flow direction. The collected data extends an existing data base of labyrinth seal performance already presented in the paper of Pychynski et al. [1]. This data base is used to create models to calculate labyrinth seal performance depending on up to 25 input parameters. The resulting models will be used as a basis for a universal optimization tool for labyrinth seals. In the paper the new and versatile test rig for various kinds of labyrinth and gap seals is presented and an analysis of measurement accuracy will be given. The results of a first set of experiments performed with new (i.e. unworn) geometries are compared to experimental data of similar labyrinth geometries from previous investigations, showing an excellent agreement. The results are then interpreted using Data Mining Methods to identify correlations between different input parameters and the labyrinth seal discharge coefficient. The paper will show that a data based approach can yield similar quality relations as empirical studies but is much less time consuming and more versatile. Several models with different sets of input parameters will be presented and compared as to their applicability in automated geometry optimization using a newly developed optimization tool.


2015 ◽  
Vol 37 ◽  
pp. 108 ◽  
Author(s):  
Hooman Fetanat ◽  
Leila Mortazavifar ◽  
Narsis Zarshenas

Information Technology has a positive impact on other disciplines. Using today's technology, precision agriculture and InformationTechnology are mixed together. Use of Information Technology in agriculture will lead to improvements in productivity. For this purpose,the raw data is transformed into useful information through data mining. This research determined whether data mining techniques can alsobe used to improve pattern recognition and analysis of large growth factors of ornamental plants experimental datasets. Furthermore, theresearch aimed to establish data mining techniques can be used to assist in the classification and regression methods by determining whethermeaningful patterns exist various growth factors of ornamental plants characterized at various research sites across Kish Island. Differentdata mining techniques were used analyze a large data base of ornamental plants properties attributes. The data base has been collected fromdifferent plants of Kish Island in various areas in order to determine, classify and predict effective growth factors on blooming. In thisresearch, analyzed data with regression technique showed the effect of chlorophyll content on the number of flowers. The analysis of theseagricultural data base with different data mining methods may have some advantages in agriculture


2016 ◽  
Vol 4 (2) ◽  
pp. 152
Author(s):  
Kuswari Hernawati ◽  
Nur Insani ◽  
Bambang Sumarno ◽  
Nurhadi Waryanto

In the university management, in addition to resources infrastructure, facilities and people, information systems is one of the resources that can be utilized to enhance the competitive advantage and provide accurate data for the benefit of policy makers, for example, information about the test scores SNMPTN, region of origin students, GPA student, students study duration. Yogyakarta State University to accept new students with an average of approximately 6,000 people annually, through the National Selection of State Universities Student (SNMPTN), SBMPTN (Joint Student Selection State University) and the Independent Selection exam (SM). With the increasing number of prospective students through SBMPTN, then increasingly also the basic data in a database of prospective students annually. By utilizing basic data on SBMPTN students and grade point average (GPA), the study aims to apply data mining techniques using the association rule Apriori algorithm to look for patterns of association between baseline SBMPTN and UNY students GPA. Basic data to be processed SBMPTN mining student data origin include school, home school district data, earnings data parent, parental education level data, the average value data UAN, and data values academic potential test (TPA). The results obtained are no data in the data base SNMPTN that significantly affect the acquisition of GPA. This is evident from the association rules derived from the 50 best asosoasi rules not seen the emergence of itemset GPA accompanied by the emergence of other itemset. Keywords: Data Mining, Association Rule, Algoritma Apriori, SNMPTN 


Author(s):  
Gary Smith

IBM’s Watson got an avalanche of publicity when it won Jeopardy, but Watson is potentially far more valuable as a massive digital database for doctors, lawyers, and other professionals who can benefit from fast, accurate access to information. A doctor who suspects that a patient may have a certain disease can ask Watson to list the recognized symptoms. A doctor who notices several abnormalities in a patient, but isn’t confident about which diseases are associated with these symptoms, can ask Watson to list the possible diseases. A doctor who is convinced that a patient has a certain illness can ask Watson to list the recommended treatments. In each case, Watson can make multiple suggestions, with associated probabilities and hyperlinks to the medical records and journal articles that it relied on for its recommendations. Watson and other computerized medical data bases are valuable resources that take advantage of the power of computers to acquire, store, and retrieve information. There are caveats though. One is simply that a medical data base is not nearly as reliable as a Jeopardy data base. Artificial intelligence algorithms are very good at finding patterns in data, but they are very bad at assessing the reliability of the data and the plausibility of a statistical analysis. It could end tragically if a doctor entered a patient’s symptoms into a black-box data-mining program and was told what treatments to use, without any explanation for the diagnosis or prescription. Think for a moment about your reaction if your doctor said, I don’t know why you are ill, but my computer says, “Take these pills.” I don’t know why you are ill, but my computer recommends surgery. Any medical software that uses neural networks or data reduction programs, such as principal components and factor analysis, will be hard-pressed to provide an explanation for the diagnosis and prescribed treatment. Patients won’t know. Doctors won’t know. Even the software engineers who created the black-box system won’t know. Nobody knows. Watson and similar programs are great as a reference tool, but they are not a substitute for doctors because: (a) the medical literature is often wrong; and (b) these errors are compounded by the use of data-mining software.


Author(s):  
Ioannis N. Kouris

Data mining has emerged over the last decade as probably the most important application in databases. To reproduce one of the most popular but accurate definitions for data mining; “it is the process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints and regularities) from massive databases” (Piatetsky-Shapiro & Frawley 1991). In practice data mining can be thought of as the “crystal ball” of businessmen, scientists, politicians and generally all kinds of people and professions wishing to get more insight on their field of interest and their data. Of course this “crystal ball” is based on a sound and broad scientific basis, using techniques borrowed from fields such as statistics, artificial intelligence, machine learning, mathematics and database research in general among others. Applications of data mining range from analyzing simple point of sales transactions and text documents to astronomical data and homeland security (Data Mining and Homeland Security: An Overview). Usually different applications may require different data mining techniques. The main kinds of techniques that are used in order to discover knowledge from a database are categorized into association rules mining, classification and clustering, with association rules being the most extensively and actively studied area. The problem of finding association rules can be formulated as follows: Given a large data base of item transactions, find all frequent itemsets, where a frequent itemset is one that occurs in at least a userspecified percentage of the data base. In other words find rules of the form X?Y, where X and Y are sets of items. A rule expresses the possibility that whenever we find a transaction that contains all items in X, then this transaction is likely to also contain all items in Y. Consequently X is called the body of the rule and Y the head. The validity and reliability of association rules is expressed usually by means of support and confidence. An example of such a rule is {smoking, no_workout?heart_disease (sup=50%, conf=90%)}, which means that 90% of the people that smoke and do not work out present heart problems, whereas 50% of all our people present all these together. Nevertheless the prominent model for contemplating data in almost all circumstances has been a rather simplistic and crude one, making several concessions. More specifically objects inside the data, like for example items within transactions, have been attributed a Boolean hypostasis (i.e. they appear or not) with their ordering being considered of no interest because they are considered altogether as sets. Of course similar concessions are made in many other fields in order to come to a feasible solution (e.g. in mining data streams). Certainly there is a trade off between the actual depth and precision of knowledge that we wish to uncover from a database and the amount and complexity of data that we are capable of processing to reach that target. In this work we concentrate on the possibility of taking into consideration and utilizing in some way the order of items within data. There are many areas in real world applications and systems that require data with temporal, spatial, spatiotemporal or ordered properties in general where their inherent sequential nature imposes the need for proper storage and processing. Such data include those collected from telecommunication systems, computer networks, wireless sensor networks, retail and logistics. There is a variety of interpretations that can be used to preserve data ordering in a sufficient way according to the intended system functionality.


Sign in / Sign up

Export Citation Format

Share Document