scholarly journals Practical Constraint K-Segment Principal Curve Algorithms for Generating Railway GPS Digital Map

2013 ◽  
Vol 2013 ◽  
pp. 1-11
Author(s):  
Dewang Chen ◽  
Long Chen

In order to obtain a decent trade-off between the low-cost, low-accuracy Global Positioning System (GPS) receivers and the requirements of high-precision digital maps for modern railways, using the concept of constraint K-segment principal curves (CKPCS) and the expert knowledge on railways, we propose three practical CKPCS generation algorithms with reduced computational complexity, and thereafter more suitable for engineering applications. The three algorithms are named ALLopt, MPMopt, and DCopt, in which ALLopt exploits global optimization and MPMopt and DCopt apply local optimization with different initial solutions. We compare the three practical algorithms according to their performance on average projection error, stability, and the fitness for simple and complex simulated trajectories with noise data. It is found that ALLopt only works well for simple curves and small data sets. The other two algorithms can work better for complex curves and large data sets. Moreover, MPMopt runs faster than DCopt, but DCopt can work better for some curves with cross points. The three algorithms are also applied in generating GPS digital maps for two railway GPS data sets measured in Qinghai-Tibet Railway (QTR). Similar results like the ones in synthetic data are obtained. Because the trajectory of a railway is relatively simple and straight, we conclude that MPMopt works best according to the comprehensive considerations on the speed of computation and the quality of generated CKPCS. MPMopt can be used to obtain some key points to represent a large amount of GPS data. Hence, it can greatly reduce the data storage requirements and increase the positioning speed for real-time digital map applications.

2020 ◽  
pp. 1-11
Author(s):  
Erjia Yan ◽  
Zheng Chen ◽  
Kai Li

Citation sentiment plays an important role in citation analysis and scholarly communication research, but prior citation sentiment studies have used small data sets and relied largely on manual annotation. This paper uses a large data set of PubMed Central (PMC) full-text publications and analyzes citation sentiment in more than 32 million citances within PMC, revealing citation sentiment patterns at the journal and discipline levels. This paper finds a weak relationship between a journal’s citation impact (as measured by CiteScore) and the average sentiment score of citances to its publications. When journals are aggregated into quartiles based on citation impact, we find that journals in higher quartiles are cited more favorably than those in the lower quartiles. Further, social science journals are found to be cited with higher sentiment, followed by engineering and natural science and biomedical journals, respectively. This result may be attributed to disciplinary discourse patterns in which social science researchers tend to use more subjective terms to describe others’ work than do natural science or biomedical researchers.


2020 ◽  
Author(s):  
Chenru Duan ◽  
Fang Liu ◽  
Aditya Nandy ◽  
Heather Kulik

High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %<i>E</i><sub>corr</sub>. None of the DFT-based diagnostics are nearly as predictive of %<i>E</i><sub>corr</sub> as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.


Author(s):  
Kim Wallin

The standard Master Curve (MC) deals only with materials assumed to be homogeneous, but MC analysis methods for inhomogeneous materials have also been developed. Especially the bi-modal and multi-modal analysis methods are becoming more and more standard. Their drawback is that these methods are generally reliable only with sufficiently large data sets (number of valid tests, r ≥ 15–20). Here, the possibility of using the multi-modal analysis method with smaller data sets is assessed, and a new procedure to conservatively account for possible inhomogeneities is proposed.


Author(s):  
Gary Smith ◽  
Jay Cordes

Patterns are inevitable and we should not be surprised by them. Streaks, clusters, and correlations are the norm, not the exception. In a large number of coin flips, there are likely to be coincidental clusters of heads and tails. In nationwide data on cancer, crime, or test scores, there are likely to be flukey clusters. When the data are separated into smaller geographic units like cities, the most extreme results are likely to be found in the smallest cities. In athletic competitions between well-matched teams, the outcome of a small number of games is almost meaningless. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful; for example, thinking that clustering in large data sets or differences among small data sets must be something real that needs to be explained. Often, it is just meaningless happenstance.


2008 ◽  
Vol 130 (2) ◽  
Author(s):  
Stuart Holdsworth

The European Creep Collaborative Committee (ECCC) approach to creep data assessment has now been established for almost ten years. The methodology covers the analysis of rupture strength and ductility, creep strain, and stress relaxation data, for a range of material conditions. This paper reviews the concepts and procedures involved. The original approach was devised to determine data sheets for use by committees responsible for the preparation of National and International Design and Product Standards, and the methods developed for data quality evaluation and data analysis were therefore intentionally rigorous. The focus was clearly on the determination of long-time property values from the largest possible data sets involving a significant number of observations in the mechanism regime for which predictions were required. More recently, the emphasis has changed. There is now an increasing requirement for full property descriptions from very short times to very long and hence the need for much more flexible model representations than were previously required. There continues to be a requirement for reliable long-time predictions from relatively small data sets comprising relatively short duration tests, in particular, to exploit new alloy developments at the earliest practical opportunity. In such circumstances, it is not feasible to apply the same degree of rigor adopted for large data set assessment. Current developments are reviewed.


1981 ◽  
Vol 35 (1) ◽  
pp. 35-42 ◽  
Author(s):  
J. D. Algeo ◽  
M. B. Denton

A numerical method for evaluating the inverted Abel integral employing cubic spline approximations is described along with a modification of the procedure of Cremers and Birkebak, and an extension of the Barr method. The accuracy of the computations is evaluated at several noise levels and with varying resolution of the input data. The cubic spline method is found to be useful only at very low noise levels, but capable of providing good results with small data sets. The Barr method is computationally the simplest, and is adequate when large data sets are available. For noisy data, the method of Cremers and Birkebak gave the best results.


2008 ◽  
Vol 20 (2) ◽  
pp. 523-554 ◽  
Author(s):  
Geert Gins ◽  
Ilse Y. Smets ◽  
Jan F. Van Impe

Various machine learning problems rely on kernel-based methods. The power of these methods resides in the ability to solve highly nonlinear problems by reformulating them in a linear context. The dominant eigenspace of a (normalized) kernel matrix is often required. Unfortunately, the computational requirements of the existing kernel methods are such that the applicability is restricted to relatively small data sets. This letter therefore focuses on a kernel-based method for large data sets. More specifically, a numerically stable tracking algorithm for the dominant eigenspace of a normalized kernel matrix is proposed, which proceeds by an updating (the addition of a new data point) followed by a downdating (the exclusion of an old data point) of the kernel matrix. Testing the algorithm on some representative case studies reveals that a very good approximation of the dominant eigenspace is obtained, while only a minimal amount of operations and memory space per iteration step is required.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
N. Dwivedi ◽  
A. K. Ott ◽  
K. Sasikumar ◽  
C. Dou ◽  
R. J. Yeo ◽  
...  

AbstractHard disk drives (HDDs) are used as secondary storage in digital electronic devices owing to low cost and large data storage capacity. Due to the exponentially increasing amount of data, there is a need to increase areal storage densities beyond ~1 Tb/in2. This requires the thickness of carbon overcoats (COCs) to be <2 nm. However, friction, wear, corrosion, and thermal stability are critical concerns below 2 nm, limiting current technology, and restricting COC integration with heat assisted magnetic recording technology (HAMR). Here we show that graphene-based overcoats can overcome all these limitations, and achieve two-fold reduction in friction and provide better corrosion and wear resistance than state-of-the-art COCs, while withstanding HAMR conditions. Thus, we expect that graphene overcoats may enable the development of 4–10 Tb/in2 areal density HDDs when employing suitable recording technologies, such as HAMR and HAMR+bit patterned media


2021 ◽  
Vol 145 (9) ◽  
pp. 1095-1109
Author(s):  
Kyle Rehder ◽  
Kathryn C. Adair ◽  
J. Bryan Sexton

Context.— Problems with health care worker (HCW) well-being have become a leading concern in medicine given their severity and robust links to outcomes like medical error, mortality, and turnover. Objective.— To describe the state of the science regarding HCW well-being, including how it is measured, what outcomes it predicts, and what institutional and individual interventions appear to reduce it. Data Sources.— Peer review articles as well as multiple large data sets collected within our own research team are used to describe the nature of burnout, associations with institutional resources, and individual tools to improve well-being. Conclusions.— Rates of HCW burnout are alarmingly high, placing the health and safety of patients and HCWs at risk. To help address the urgent need to help HCWs, we summarize some of the most promising early interventions, and point toward future research that uses standardized metrics to evaluate interventions (with a focus on low-cost institutional and personal interventions).


2013 ◽  
Vol 35 ◽  
pp. 513-523 ◽  
Author(s):  
Jing Wang ◽  
Bobbie-Jo M. Webb-Robertson ◽  
Melissa M. Matzke ◽  
Susan M. Varnum ◽  
Joseph N. Brown ◽  
...  

Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process.Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme.Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection.Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort.Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.


Sign in / Sign up

Export Citation Format

Share Document