An Ontology-based Visual Analytics for Apple Variety Testing

Preprocessing Profiling Model for Visual Analytics

10.5753/sibgrapi.est.2020.12991 ◽

2020 ◽

Author(s):

Alessandra Maciel Paz Milani ◽

Fernando V. Paulovich ◽

Isabel Harb Manssour

Keyword(s):

Data Mining ◽

Data Analysis ◽

Visual Analytics ◽

Data Preprocessing ◽

Interview Study ◽

Raw Data ◽

Important Stage ◽

Analysis Process

Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.

Download Full-text

Data Mining and Homeland Security

Electronic Government ◽

10.4018/978-1-59904-947-2.ch098 ◽

2011 ◽

pp. 1323-1331

Author(s):

Jeffrey W. Seifert

Keyword(s):

Data Mining ◽

Data Analysis ◽

Homeland Security ◽

Predictive Analytics ◽

New Technology ◽

Analytical Techniques ◽

Data Sources ◽

High Expectations ◽

Multiple Data ◽

Factual Data

A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool, and, in some cases, a substitute, for human resources. One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security, while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community (Relyea, 2002). While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people. Regardless of which definition one prefers, a common theme is the ability to collect and combine, virtually if not physically, multiple data sources, for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person’s activities (Clarke, 1988). More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective tool. Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste, but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some national defense/homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent.

Download Full-text

Semantic Integration and Knowledge Discovery for Environmental Research

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch010 ◽

2008 ◽

pp. 213-235

Author(s):

Zhiyuan Chen ◽

Aryya Gangopadhyay ◽

George Karabatis ◽

Michael McGuire ◽

Claire Welty

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Semantic Network ◽

Semantic Integration ◽

Environmental Data ◽

Data Sources ◽

Environmental Research ◽

Related Data ◽

Use Of Data ◽

Data Semantics

Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantics-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution browsing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results.

Download Full-text

Data Mining and Homeland Security

Encyclopedia of Digital Government ◽

10.4018/978-1-59140-789-8.ch042 ◽

2011 ◽

pp. 277-282 ◽

Cited By ~ 2

Author(s):

J. W. Seifert

Keyword(s):

Data Mining ◽

Data Analysis ◽

Homeland Security ◽

Predictive Analytics ◽

New Technology ◽

Analytical Techniques ◽

Data Sources ◽

High Expectations ◽

Multiple Data ◽

Factual Data

A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool, and, in some cases, a substitute, for human resources. One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security, while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community (Relyea, 2002). While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people. Regardless of which definition one prefers, a common theme is the ability to collect and combine, virtually if not physically, multiple data sources, for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person’s activities (Clarke, 1988). More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective tool. Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste, but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some national defense/homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent.

Download Full-text

Integration of Data Sources through Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch118 ◽

2011 ◽

pp. 625-629

Author(s):

Andreas Koeller

Keyword(s):

Data Mining ◽

Data Integration ◽

Data Transformation ◽

Data Sources ◽

Integration Process

Integration of data sources refers to the task of developing a common schema as well as data transformation solutions for a number of data sources with related content. The large number and size of modern data sources make manual approaches at integration increasingly impractical. Data mining can help to partially or fully automate the data integration process.

Download Full-text

MOBILITY ATLAS BOOKLET: AN URBAN DASHBOARD DESIGN AND IMPLEMENTATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-4-w7-51-2018 ◽

2018 ◽

Vol IV-4/W7 ◽

pp. 51-58

Author(s):

L. Gabrielli ◽

M. Rossi ◽

F. Giannotti ◽

D. Fadda ◽

S. Rinzivillo

Keyword(s):

Data Mining ◽

Mobile Phone ◽

Public Transport ◽

Visual Analytics ◽

Domain Knowledge ◽

Decision Makers ◽

Data Sources ◽

Electrical Mobility ◽

Sustainability Index ◽

Design And Implementation

<p><strong>Abstract.</strong> The new data sources give the possibility to answer analytically the questions that arise from mobility manager. The process of transforming raw data into knowledge is very complex, and it is necessary to provide metaphors of visualizations that are understandable to decision makers. Here, we propose an analytical platform that extracts information on the mobility of individuals from mobile phone by applying Data Mining methodologies. The main results highlighted here are both technical and methodological. First, communicating information through visual analytics techniques facilitates understanding of information to those who have no specific technical or domain knowledge. Secondly, the API system guarantees the ability to export aggregates according to the granularity required, enabling other actors to produce new services based on the extracted models. For the future, we expect to extend the platform by inserting other layers. For example, a layer for measuring the sustainability index of a territory, such as the ability of public transport to attract private mobility or the index that measures how many private vehicle trips can be converted into electrical mobility.</p>

Download Full-text

Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities

10.31224/osf.io/5d3sr ◽

2019 ◽

Author(s):

Luke Barker ◽

CJA MacLeod

Keyword(s):

Data Mining ◽

At Risk ◽

Real Time ◽

Language Processing ◽

Environmental Data ◽

Data Sources ◽

Extreme Weather Events ◽

Modular Architecture ◽

National Scale ◽

Twitter Data

Social media, particularly Twitter, is increasingly used to improve resilience during extreme weather events/emergency management situations, including floods: by communicating potential risks and their impacts, and informing agencies and responders. In this paper, we developed a prototype national-scale Twitter data mining pipeline for improved stakeholder situational awareness during flooding events across Great Britain, by retrieving relevant social geodata, grounded in environmental data sources (flood warnings and river levels). With potential users we identified and addressed three research questions to develop this application, whose components constitute a modular architecture for real-time dashboards. First, polling national flood warning and river level Web data sources to obtain at-risk locations. Secondly, real-time retrieval of geotagged tweets, proximate to at-risk areas. Thirdly, filtering flood-relevant tweets with natural language processing and machine learning libraries, using word embeddings of tweets. We demonstrated the national-scale social geodata pipeline using over 420,000 georeferenced tweets obtained between 20 and 29th June 2016.

Download Full-text

An Ontology-based Approach to Guide and Document Variable and Data Source Selection and Data Integration Process to Support Integrative Data Analysis in Cancer Outcomes Research

10.1101/2020.05.28.20115907 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hansi Zhang ◽

Yi Guo ◽

Jiang Bian

Keyword(s):

Risk Factors ◽

Data Analysis ◽

Data Integration ◽

Reporting Guideline ◽

Data Sources ◽

Integration Process ◽

Cancer Outcomes ◽

Source Selection ◽

Data Source ◽

Multi Level

AbstractBackgroundTo reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, prior research on RFs of cancer outcomes, has primarily focused on individual level RFs due to the lack of integrated datasets that contain multi-level, multi-domain RFs. Further, the lack of a consensus and proper guidance on systematically identify RFs also increase the difficulty of RF selection from heterogenous data sources in a multi-level integrative data analysis (mIDA) study. More importantly, as mIDA studies require integrating heterogenous data sources, the data integration processes in the limited number of existing mIDA studies are inconsistently performed and poorly documented, and thus threatening transparency and reproducibility.MethodsInformed by the National Institute on Minority Health and Health Disparities (NIMHD) research framework, we (1) reviewed existing reporting guidelines from the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network and (2) developed a theory-driven reporting guideline to guide the RF variable selection, data source selection, and data integration process. Then, we developed an ontology to standardize the documentation of the RF selection and data integration process in mIDA studies.ResultsWe summarized the review results and created a reporting guideline—ATTEST—for reporting the variable selection and data source selection and integration process. We provided an ATTEST check list to help researchers to annotate and clearly document each step of their mIDA studies to ensure the transparency and reproducibility. We used the ATTEST to report two mIDA case studies and further transformed annotation results into sematic triples, so that the relationships among variables, data sources and integration processes are explicitly standardized and modeled using the classes and properties from OD-ATTEST.ConclusionOur ontology-based reporting guideline solves some key challenges in current mIDA studies for cancer outcomes research, through providing (1) a theory-driven guidance for multi-level and multi-domain RF variable and data source selection; and (2) a standardized documentation of the data selection and integration processes powered by an ontology, thus a way to enable sharing of mIDA study reports among researchers.

Download Full-text

Opportunities for the Use of Business Data Analysis Technologies

Economics and Business ◽

10.1515/eb-2016-0003 ◽

2016 ◽

Vol 28 (1) ◽

pp. 20-25 ◽

Cited By ~ 3

Author(s):

Signe Bāliņa ◽

Rita Žuka ◽

Juris Krasts

Keyword(s):

Data Mining ◽

Big Data ◽

Data Analysis ◽

Predictive Modelling ◽

Data Sources ◽

Data Source ◽

Business Data

Abstract The paper analyses the business data analysis technologies, provides their classification and considers relevant terminology. The feasibility of business data analysis technologies handling big data sources is overviewed. The paper shows the results of examination of the online big data source analytics technologies, data mining and predictive modelling technologies and their trends.

Download Full-text

Data Mining and Homeland Security

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch227 ◽

2008 ◽

pp. 3630-3638 ◽

Cited By ~ 3

Author(s):

Jeffrey W. Seifert

Keyword(s):

Data Mining ◽

Data Analysis ◽

Homeland Security ◽

Predictive Analytics ◽

New Technology ◽

Analytical Techniques ◽

Data Sources ◽

High Expectations ◽

Multiple Data ◽

Factual Data

A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool, and, in some cases, a substitute, for human resources. One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security, while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community (Relyea, 2002). While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people. Regardless of which definition one prefers, a common theme is the ability to collect and combine, virtually if not physically, multiple data sources, for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person’s activities (Clarke, 1988). More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective tool. Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste, but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some national defense/homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent.

Download Full-text