scholarly journals Analysis of NSIDC Dataset Downloads and Metadata

2017 ◽  
Author(s):  
Yulia Kolesnikova ◽  
Adam Lathrop ◽  
Bree Norlander ◽  
An Yan

Few research studies have quantitatively analyzed metadata elements associated with scientific data reuse. By using metadata and dataset download rates from the National Snow and Ice Data Center, we address whether there are key indicators in data repository metadata that show a statistically significant correlation with the download count of a dataset and whether we can predict data reuse using machine learning techniques. We used the download rate by unique IP addresses for individual datasets as our dependent variable and as a proxy for data reuse. Our analysis shows that the following metadata elements in NSIDC datasets are positively correlated with download rates: year of citation, number of data formats, number of contributors, number of platforms, number of spatial coverage areas, number of locations, and number of keywords. Our results are applicable to researchers and professionals working with data and add to the small body of work addressing metadata best practices for increasing discovery of data.

2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Adam Gauci ◽  
Aldo Drago ◽  
John Abela

High frequency (HF) radar installations are becoming essential components of operational real-time marine monitoring systems. The underlying technology is being further enhanced to fully exploit the potential of mapping sea surface currents and wave fields over wide areas with high spatial and temporal resolution, even in adverse meteo-marine conditions. Data applications are opening to many different sectors, reaching out beyond research and monitoring, targeting downstream services in support to key national and regional stakeholders. In the CALYPSO project, the HF radar system composed of CODAR SeaSonde stations installed in the Malta Channel is specifically serving to assist in the response against marine oil spills and to support search and rescue at sea. One key drawback concerns the sporadic inconsistency in the spatial coverage of radar data which is dictated by the sea state as well as by interference from unknown sources that may be competing with transmissions in the same frequency band. This work investigates the use of Machine Learning techniques to fill in missing data in a high resolution grid. Past radar data and wind vectors obtained from satellites are used to predict missing information and provide a more consistent dataset.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Jay Jacobs ◽  
Sasha Romanosky ◽  
Idris Adjerid ◽  
Wade Baker

Abstract Despite significant innovations in IT security products and research over the past 20 years, the information security field is still immature and struggling. Practitioners lack the ability to properly assess cyber risk, and decision-makers continue to be paralyzed by vulnerability scanners that overload their staff with mountains of scan results. In order to cope, firms prioritize vulnerability remediation using crude heuristics and limited data, though they are still too often breached by known vulnerabilities for which patches have existed for months or years. And so, the key challenge firms face is trying to identify a remediation strategy that best balances two competing forces. On one hand, it could attempt to patch all vulnerabilities on its network. While this would provide the greatest ‘coverage’ of vulnerabilities patched, it would inefficiently consume resources by fixing low-risk vulnerabilities. On the other hand, patching a few high-risk vulnerabilities would be highly ‘efficient’, but may leave the firm exposed to many other high-risk vulnerabilities. Using a large collection of multiple datasets together with machine learning techniques, we construct a series of vulnerability remediation strategies and compare how each perform in regard to trading off coverage and efficiency. We expand and improve upon the small body of literature that uses predictions of ‘published exploits’, by instead using ‘exploits in the wild’ as our outcome variable. We implement the machine learning models by classifying vulnerabilities according to high- and low-risk, where we consider high-risk vulnerabilities to be those that have been exploited in actual firm networks.


2018 ◽  
Vol 42 (1) ◽  
pp. 124-142 ◽  
Author(s):  
Youngseek Kim ◽  
Seungahn Nah

Purpose The purpose of this paper is to examine how data reuse experience, attitudinal beliefs, social norms, and resource factors influence internet researchers to share data with other researchers outside their teams. Design/methodology/approach An online survey was conducted to examine the extent to which data reuse experience, attitudinal beliefs, social norms, and resource factors predicted internet researchers’ data sharing intentions and behaviors. The theorized model was tested using a structural equation modeling technique to analyze a total of 201 survey responses from the Association of Internet Researchers mailing list. Findings Results show that data reuse experience significantly influenced participants’ perception of benefit from data sharing and participants’ norm of data sharing. Belief structures regarding data sharing, including perceived career benefit and risk, and perceived effort, had significant associations with attitude toward data sharing, leading internet researchers to have greater data sharing intentions and behavior. The results also reveal that researchers’ norms for data sharing had a direct effect on data sharing intention. Furthermore, the results indicate that, while the perceived availability of data repository did not yield a positive impact on data sharing intention, it has a significant, direct, positive impact on researchers’ data sharing behaviors. Research limitations/implications This study validated its novel theorized model based on the theory of planned behavior (TPB). The study showed a holistic picture of how different data sharing factors, including data reuse experience, attitudinal beliefs, social norms, and data repositories, influence internet researchers’ data sharing intentions and behaviors. Practical implications Data reuse experience, attitude toward and norm of data sharing, and the availability of data repository had either direct or indirect influence on internet researchers’ data sharing behaviors. Thus, professional associations, funding agencies, and academic institutions alike should promote academic cultures that value data sharing in order to create a virtuous cycle of reciprocity and encourage researchers to have positive attitudes toward/norms of data sharing; these cultures should be strengthened by the strong support of data repositories. Originality/value In line with prior scholarship concerning scientific data sharing, this study of internet researchers offers a map of scientific data sharing intentions and behaviors by examining the impacts of data reuse experience, attitudinal beliefs, social norms, and data repositories together.


Author(s):  
Mousa Elrotub ◽  
Ahmed Bali ◽  
Abdelouahed Gherbi

The problem of balancing user requests in cloud computing is becoming more serious due to the variation of workloads. Load balancing and allocation processes still need more optimizing methodologies and models to improve performance and increase the quality of service. This article describes a solution to balance user workload efficiently by proposing a model that allows each virtual machine (VM) to maximize the serving number of requests based on its capacity. The model measures VMs' capacity as a percentage and maps groups of user requests to appropriate active virtual machines. Finding the expected patterns from a big data repository, such as log data, and using some machine learning techniques can make the prediction more efficiently. The work is implemented and evaluated using some performance metrics, and the results are compared with other research. The evaluation shows the efficiency of the proposed approach in distributing user workload and improving results.


2021 ◽  
Author(s):  
Martin Courtois ◽  
Alexandre Filiot ◽  
Gregoire Ficheur

The use of international laboratory terminologies inside hospital information systems is required to conduct data reuse analyses through inter-hospital databases. While most terminology matching techniques performing semantic interoperability are language-based, another strategy is to use distribution matching that performs terms matching based on the statistical similarity. In this work, our objective is to design and assess a structured framework to perform distribution matching on concepts described by continuous variables. We propose a framework that combines distribution matching and machine learning techniques. Using a training sample consisting of correct and incorrect correspondences between different terminologies, a match probability score is built. For each term, best candidates are returned and sorted in decreasing order using the probability given by the model. Searching 101 terms from Lille University Hospital among the same list of concepts in MIMIC-III, the model returned the correct match in the top 5 candidates for 96 of them (95%). Using this open-source framework with a top-k suggestions system could make the expert validation of terminologies alignment easier.


Author(s):  
Anchitaalagammai J. V. ◽  
Kavitha Samayadurai ◽  
Murali S. ◽  
Padmadevi S. ◽  
Shantha Lakshmi Revathy J.

Internet of things (IoT) describes an emerging trend where a large number of embedded devices (things) are connected to the internet to participate in automating activities that create compounded value for the end consumers as well as for the enterprises. One of the greatest concerns in IoT is security, and how software engineers address it will play a deeper role. As devices interact with each other, businesses need to be able to securely handle the data deluge. With focused approach, it is possible to minimize the vulnerabilities and risks exposed to the devices and networks. Adopting security-induced software development lifecycle (SDL) is one of the major steps in identifying and minimizing the zero-day vulnerabilities and hence to secure the IoT applications and devices. This chapter focuses best practices for adopting security into the software development process with the help of two approaches: cryptographic and machine learning techniques to integrate secure coding and security testing ingrained as part of software development lifecycle.


Author(s):  
Ryan Abernathey ◽  
Tom Augspurger ◽  
Anderson Banihirwe ◽  
Charles C Blackmon-Luca ◽  
Timothy J Crone ◽  
...  

Scientific data has traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow towards the petabyte scale. A “cloud-native data repository,” as defined in this paper, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access & inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Sign in / Sign up

Export Citation Format

Share Document