A Framework for Service Semantic Description Based on Knowledge Graph

To construct a large-scale service knowledge graph is necessary. We propose a method, namely semantic information extension, for service knowledge graphs. We insist on the information of services described by Web Services Description Language (WSDL) and we design the ontology layer of web service knowledge graph and construct the service graph, and using the WSDL document data set, the generated service knowledge graph contains 3738 service entities. In particular, our method can give a full performance to its effect in service discovery. To evaluate our approach, we conducted two sets of experiments to explore the relationship between services and classify services that develop by service descriptions. We constructed two experimental data sets, then designed and trained two different deep neural networks for the two tasks to extract the semantics of the natural language used in the service discovery task. In the prediction task of exploring the relationship between services, the prediction accuracy rate reached 95.1%, and in the service classification experiment, the accuracy rate of TOP5 reached 60.8%. Our experience shows that the service knowledge graph has additional advantages over traditional file storage when managing additional semantic information is effective and the new service representation method is helpful for service discovery and composition tasks.

Download Full-text

Reproduction of the Results of Onyike et al. (2003)

Meta-Psychology ◽

10.15626/mp.2019.2071 ◽

2021 ◽

Vol 5 ◽

Author(s):

Nicholas J L Brown ◽

Jan B Van Rongen ◽

Jakob Van de Velde ◽

Matt Williams

Keyword(s):

Large Scale ◽

Severe Obesity ◽

Descriptive Statistics ◽

Nhanes Iii ◽

Data Sets ◽

Data Set ◽

The Public ◽

Health And Nutrition ◽

Confounding Variables ◽

The Relationship

Onyike et al. (2003) analyzed data from a large-scale US-American data set, the Third National Health and Nutrition Examination Survey (NHANES-III), and reported an association between obesity and major depression, especially among people with severe obesity. Here, we report the results of a detailed replication of Onyike et al.’s analyses. While we were able to reproduce the majority of these authors’ descriptive statistics, this took a substantial amount of time and effort, and we found several minor errors in the univariate descriptive statistics reported in their Tables 1 and 2. We were able to reproduce most of Onyike et al.’s bivariate findings regarding the relationship between obesity and depression (Tables 3 and 4), albeit with some small discrepancies (e.g., with respect to the magnitudes of standard errors). On the other hand, we were unable to reproduce Table 5, containing Onyike et al.’s findings with respect to the relationship between obesity and depression when controlling for plausible confounding variables—arguably the paper’s most important results—because some of the included predictor variables appear to be either unavailable, or not coded in the way reported by Onyike et al., in the public NHANES-III data sets. We discuss the implications of our findings for the transparency of reporting and the reproducibility of published results.

Download Full-text

A Reproduction of the Results of Onyike et al. (2003).

10.31234/osf.io/hbnm9 ◽

2019 ◽

Author(s):

Nicholas John Laird Brown ◽

Jan van Rongen ◽

Jakob Van de Velde ◽

Matt Williams

Keyword(s):

Large Scale ◽

Severe Obesity ◽

Descriptive Statistics ◽

Nhanes Iii ◽

Data Sets ◽

Data Set ◽

The Public ◽

Health And Nutrition ◽

Confounding Variables ◽

The Relationship

Download Full-text

PSVI-8 Meta-regression Analysis to Determine the Relationship Between Growing Pig Body Weight and Variation

Journal of Animal Science ◽

10.1093/jas/skab054.357 ◽

2021 ◽

Vol 99 (Supplement_1) ◽

pp. 218-219

Author(s):

Andres Fernando T Russi ◽

Mike D Tokach ◽

Jason C Woodworth ◽

Joel M DeRouchey ◽

Robert D Goodband ◽

...

Keyword(s):

Body Weight ◽

Regression Analysis ◽

Sample Size ◽

Polynomial Regression ◽

Data Sets ◽

Regression Equations ◽

Prediction Equations ◽

Data Set ◽

Rate Of Increase ◽

The Relationship

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

The Midlatitude Continental Convective Clouds Experiment (MC3E) sounding network: operations, processing and analysis

Atmospheric Measurement Techniques ◽

10.5194/amt-8-421-2015 ◽

2015 ◽

Vol 8 (1) ◽

pp. 421-434 ◽

Cited By ~ 18

Author(s):

M. P. Jensen ◽

T. Toto ◽

D. Troyan ◽

P. E. Ciesielski ◽

D. Holdridge ◽

...

Keyword(s):

Large Scale ◽

Scale Model ◽

Data Sets ◽

Central Plains ◽

Data Set ◽

Convective Systems ◽

Convective Clouds ◽

Quality Checks ◽

Network Operations ◽

The Impact

Abstract. The Midlatitude Continental Convective Clouds Experiment (MC3E) took place during the spring of 2011 centered in north-central Oklahoma, USA. The main goal of this field campaign was to capture the dynamical and microphysical characteristics of precipitating convective systems in the US Central Plains. A major component of the campaign was a six-site radiosonde array designed to capture the large-scale variability of the atmospheric state with the intent of deriving model forcing data sets. Over the course of the 46-day MC3E campaign, a total of 1362 radiosondes were launched from the enhanced sonde network. This manuscript provides details on the instrumentation used as part of the sounding array, the data processing activities including quality checks and humidity bias corrections and an analysis of the impacts of bias correction and algorithm assumptions on the determination of convective levels and indices. It is found that corrections for known radiosonde humidity biases and assumptions regarding the characteristics of the surface convective parcel result in significant differences in the derived values of convective levels and indices in many soundings. In addition, the impact of including the humidity corrections and quality controls on the thermodynamic profiles that are used in the derivation of a large-scale model forcing data set are investigated. The results show a significant impact on the derived large-scale vertical velocity field illustrating the importance of addressing these humidity biases.

Download Full-text

Children with 5′-end NF1 gene mutations are more likely to have glioma

Neurology Genetics ◽

10.1212/nxg.0000000000000192 ◽

2017 ◽

Vol 3 (5) ◽

pp. e192 ◽

Cited By ~ 12

Author(s):

Corina Anastasaki ◽

Stephanie M. Morris ◽

Feng Gao ◽

David H. Gutmann

Keyword(s):

Gene Mutation ◽

Statistical Significance ◽

Gene Mutations ◽

Neurofibromatosis Type ◽

Published Data ◽

Data Sets ◽

Nonsense Mutations ◽

Data Set ◽

Nf1 Gene ◽

The Relationship

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.

Download Full-text

A fast methodology for large-scale focusing inversion of gravity and magnetic data using the structured model matrix and the 2-D fast Fourier transform

Geophysical Journal International ◽

10.1093/gji/ggaa372 ◽

2020 ◽

Vol 223 (2) ◽

pp. 1378-1397

Author(s):

Rosemary A Renaut ◽

Jarom D Hogue ◽

Saeed Vatankhah ◽

Shuang Liu

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

Linear Systems ◽

Large Scale ◽

Surface Measurement ◽

Magnetic Data ◽

Uniform Grid ◽

Data Sets ◽

Inversion Algorithm ◽

Data Set

SUMMARY We discuss the focusing inversion of potential field data for the recovery of sparse subsurface structures from surface measurement data on a uniform grid. For the uniform grid, the model sensitivity matrices have a block Toeplitz Toeplitz block structure for each block of columns related to a fixed depth layer of the subsurface. Then, all forward operations with the sensitivity matrix, or its transpose, are performed using the 2-D fast Fourier transform. Simulations are provided to show that the implementation of the focusing inversion algorithm using the fast Fourier transform is efficient, and that the algorithm can be realized on standard desktop computers with sufficient memory for storage of volumes up to size n ≈ 106. The linear systems of equations arising in the focusing inversion algorithm are solved using either Golub–Kahan bidiagonalization or randomized singular value decomposition algorithms. These two algorithms are contrasted for their efficiency when used to solve large-scale problems with respect to the sizes of the projected subspaces adopted for the solutions of the linear systems. The results confirm earlier studies that the randomized algorithms are to be preferred for the inversion of gravity data, and for data sets of size m it is sufficient to use projected spaces of size approximately m/8. For the inversion of magnetic data sets, we show that it is more efficient to use the Golub–Kahan bidiagonalization, and that it is again sufficient to use projected spaces of size approximately m/8. Simulations support the presented conclusions and are verified for the inversion of a magnetic data set obtained over the Wuskwatim Lake region in Manitoba, Canada.

Download Full-text

Six years of total ozone column measurements from SCIAMACHY nadir observations

Atmospheric Measurement Techniques ◽

10.5194/amt-2-87-2009 ◽

2009 ◽

Vol 2 (1) ◽

pp. 87-98 ◽

Cited By ~ 39

Author(s):

C. Lerot ◽

M. Van Roozendael ◽

J. van Geffen ◽

J. van Gent ◽

C. Fayt ◽

...

Keyword(s):

Cross Sections ◽

Total Ozone ◽

Large Scale ◽

European Space Agency ◽

Data Sets ◽

Data Set ◽

Ozone Data ◽

Space Agency ◽

German Aerospace ◽

The Impact

Abstract. Total O3 columns have been retrieved from six years of SCIAMACHY nadir UV radiance measurements using SDOAS, an adaptation of the GDOAS algorithm previously developed at BIRA-IASB for the GOME instrument. GDOAS and SDOAS have been implemented by the German Aerospace Center (DLR) in the version 4 of the GOME Data Processor (GDP) and in version 3 of the SCIAMACHY Ground Processor (SGP), respectively. The processors are being run at the DLR processing centre on behalf of the European Space Agency (ESA). We first focus on the description of the SDOAS algorithm with particular attention to the impact of uncertainties on the reference O3 absorption cross-sections. Second, the resulting SCIAMACHY total ozone data set is globally evaluated through large-scale comparisons with results from GOME and OMI as well as with ground-based correlative measurements. The various total ozone data sets are found to agree within 2% on average. However, a negative trend of 0.2–0.4%/year has been identified in the SCIAMACHY O3 columns; this probably originates from instrumental degradation effects that have not yet been fully characterized.

Download Full-text

Collecting public RGB-D datasets for human daily activity recognition

International Journal of Advanced Robotic Systems ◽

10.1177/1729881417709079 ◽

2017 ◽

Vol 14 (4) ◽

pp. 172988141770907 ◽

Cited By ~ 2

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Zhimeng Zhang ◽

Haibo Wang ◽

Yibin Li

Keyword(s):

Activity Recognition ◽

Daily Activity ◽

Visual Cues ◽

Large Scale ◽

Hot Spot ◽

Feature Representation ◽

Data Sets ◽

Activity Data ◽

Data Set ◽

Depth Motion Maps

Human daily activity recognition has been a hot spot in the field of computer vision for many decades. Despite best efforts, activity recognition in naturally uncontrolled settings remains a challenging problem. Recently, by being able to perceive depth and visual cues simultaneously, RGB-D cameras greatly boost the performance of activity recognition. However, due to some practical difficulties, the publicly available RGB-D data sets are not sufficiently large for benchmarking when considering the diversity of their activities, subjects, and background. This severely affects the applicability of complicated learning-based recognition approaches. To address the issue, this article provides a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set comprises 4528 samples depicting 7 action categories (up to 46 subcategories) performed by 74 subjects. To verify the challengeness of the data set, three feature representation methods are evaluated, which are depth motion maps, spatiotemporal depth cuboid similarity feature, and curvature space scale. Results show that the merged large-scale data set is more realistic and challenging and therefore more suitable for benchmarking.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text