scholarly journals Measuring the quality of scientific references in Wikipedia: an analysis of more than 115M citations to over 800,000 scientific articles

Author(s):  
Joshua M. Nicholson ◽  
Ashish Uppala ◽  
Matthias Sieber ◽  
Peter Grabitz ◽  
Milo Mordaunt ◽  
...  

AbstractWikipedia is a widely used online reference work which cites hundreds of thousands of scientific articles across its entries. The quality of these citations has not been previously measured, and such measurements have a bearing on the reliability and quality of the scientific portions of this reference work. Using a novel technique, a massive database of qualitatively described citations, and machine learning algorithms, we analyzed 1,923,575 Wikipedia articles which cited a total of 824,298 scientific articles, and found that most scientific articles (57%) are uncited or untested by subsequent studies, while the remainder show a wide variability in contradicting or supporting evidence (2-41%). Additionally, we analyzed 51,804,643 scientific articles from journals indexed in the Web of Science and found that most (85%) were uncited or untested by subsequent studies, while the remainder show a wide variability in contradicting or supporting evidence (1-14%).

2021 ◽  
Vol 218 ◽  
pp. 44-51
Author(s):  
D. Venkata Vara Prasad ◽  
Lokeswari Y. Venkataramana ◽  
P. Senthil Kumar ◽  
G. Prasannamedha ◽  
K. Soumya ◽  
...  

Cancers ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3817
Author(s):  
Shi-Jer Lou ◽  
Ming-Feng Hou ◽  
Hong-Tai Chang ◽  
Chong-Chi Chiu ◽  
Hao-Hsien Lee ◽  
...  

No studies have discussed machine learning algorithms to predict recurrence within 10 years after breast cancer surgery. This study purposed to compare the accuracy of forecasting models to predict recurrence within 10 years after breast cancer surgery and to identify significant predictors of recurrence. Registry data for breast cancer surgery patients were allocated to a training dataset (n = 798) for model development, a testing dataset (n = 171) for internal validation, and a validating dataset (n = 171) for external validation. Global sensitivity analysis was then performed to evaluate the significance of the selected predictors. Demographic characteristics, clinical characteristics, quality of care, and preoperative quality of life were significantly associated with recurrence within 10 years after breast cancer surgery (p < 0.05). Artificial neural networks had the highest prediction performance indices. Additionally, the surgeon volume was the best predictor of recurrence within 10 years after breast cancer surgery, followed by hospital volume and tumor stage. Accurate recurrence within 10 years prediction by machine learning algorithms may improve precision in managing patients after breast cancer surgery and improve understanding of risk factors for recurrence within 10 years after breast cancer surgery.


2020 ◽  
Author(s):  
D.C.L. Handler ◽  
P.A. Haynes

AbstractAssessment of replicate quality is an important process for any shotgun proteomics experiment. One fundamental question in proteomics data analysis is whether any specific replicates in a set of analyses are biasing the downstream comparative quantitation. In this paper, we present an experimental method to address such a concern. PeptideMind uses a series of clustering Machine Learning algorithms to assess outliers when comparing proteomics data from two states with six replicates each. The program is a JVM native application written in the Kotlin language with Python sub-process calls to scikit-learn. By permuting the six data replicates provided into four hundred triplet non redundant pairwise comparisons, PeptideMind determines if any one replicate is biasing the downstream quantitation of the states. In addition, PeptideMind generates useful visual representations of the spread of the significance measures, allowing researchers a rapid, effective way to monitor the quality of those identified proteins found to be differentially expressed between sample states.


2021 ◽  
Author(s):  
Ram Sunder Kalyanraman ◽  
Xiaoli Chen ◽  
Po-Yen Wu ◽  
Kevin Constable ◽  
Amit Govil ◽  
...  

Abstract Ultrasonic and sonic logs are increasingly used to evaluate the quality of cement placement in the annulus behind the pipe and its potential to perform as a barrier. Wireline logs are carried out in widely varying conditions and attempt to evaluate a variety of cement formulations in the annulus. The annulus geometry is complex due to pipe standoff and often affects the behavior (properties) of the cement. The transformation of ultrasonic data to meaningful cement evaluation is also a complex task and requires expertise to ensure the processing is correctly carried out as well interpreted correctly. Cement formulations can vary from heavy weight cement to ultralight foamed cements. The ultrasonic log-based evaluation, using legacy practices, works well for cements that are well behaved and well bonded to casing. In such cases, a lightweight cement and heavyweight cement, when bonded, can be easily discriminated from gas or liquid (mud) through simple quantitative thresholds resulting in a Solid(S) - Liquid(L) - Gas(G) map. However, ultralight and foamed cements may overlap with mud in quantitative terms. Cements may debond from casing with a gap (that is either wet or dry), resulting in a very complex log response that may not be amenable to simple threshold-based discrimination of S-L-G. Cement sheath evaluation and the inference of the cement sheath to serve as a barrier is complex. It is therefore imperative that adequate processes mitigate errors in processing and interpretation and bring in reliability and consistency. Processing inconsistencies are caused when we are unable to correctly characterize the borehole properties either due to suboptimal measurements or assumptions of the borehole environment. Experts can and do recognize inconsistencies in processing and can advise appropriate resolution to ensure correct processing. The same decision-making criteria that experts follow can be implemented through autonomous workflows. The ability for software to autocorrect is not only possible but significantly enables the reliability of the product for wellsite decisions. In complex situations of debonded cements and ultralight cements, we may need to approach the interpretation from a data behavior-based approach, which can be explained by physics and modeling or through observations in the field by experts. This leads a novel seven-class annulus characterization [5S-L-G] which we expect will bring improved clarity on the annulus behavior. We explain the rationale for such an approach by providing a catalog of log response for the seven classes. In addition, we introduce the ability to carry out such analysis autonomously though machine learning. Such machine learning algorithms are best carried out after ensuring the data is correctly processed. We demonstrate the capability through a few field examples. The ability to emulate an "expert" through software can lead to an ability to autonomously correct processing inconsistencies prior to an autonomous interpretation, thereby significantly enhancing the reliability and consistency of cement evaluation, ruling out issues related to subjectivity, training, and competency.


2018 ◽  
Vol 7 (1.7) ◽  
pp. 179
Author(s):  
Nivedhitha G ◽  
Carmel Mary Belinda M.J ◽  
Rupavathy N

The development of the phishing sites is by all accounts amazing. Despite the fact that the web clients know about these sorts of phishing assaults, part of clients move toward becoming casualty to these assaults. Quantities of assaults are propelled with the point of making web clients trust that they are speaking with a trusted entity. Phishing is one among them. Phishing is consistently developing since it is anything but difficult to duplicate a whole site utilizing the HTML source code. By rolling out slight improvements in the source code, it is conceivable to guide the victim to the phishing site. Phishers utilize part of strategies to draw the unsuspected web client. Consequently an efficient mechanism is required to recognize the phishing sites from the real sites keeping in mind the end goal to spare credential data. To detect the phishing websites and to identify it as information leaking sites, the system proposes data mining algorithms. In this paper, machine-learning algorithms have been utilized for modeling the prediction task. The process of identity extraction and feature extraction are discussed in this paper and the various experiments carried out to discover the performance of the models are demonstrated.


Author(s):  
Mahalaxmi P P ◽  
Kavita D. Hanabaratti

This review paper discuss about recent techniques and methods used for grain classification and grading. Grains are important source of nutrients and they play important role in healthy diet. The production of grains across worldwide each year is in terms of hundreds of millions. The common method to classify these hugely produced grains is manual which is mind-numbing and not accurate. So the automated system is required which can classify the verities and predict the quality (i.e. grade A, grade B) of grain fast and accurate. As machine learning had done most of the difficult things easy, many machine learning algorithms can be used which can easily classify and predict the quality of grains. The system uses colour and geometrical features like size and area of grains as attributes for classification and quality prediction. Here, several image procession methods and machine learning algorithms are reviewed.


2019 ◽  
Vol 8 (4) ◽  
pp. 1426-1430

Continuous integration and Continuous Deployment (CICD) is a trending practice in agile software development. Using Continuous Integration helps the developers to find the bugs before it goes to the production by running unit tests, smoke tests etc. Deploying the components of the application in Production using Continuous Deployment, using this way, the new release of the application reaches the client faster. Continuous Security makes sure that the application is less prone to vulnerabilities by doing static scans on code and dynamic scans on the deployed releases. The goal of this study is to identify the benefits of adapting the Continuous Integration - Continuous Deployment in Application Software. The Pipeline involves Implementation of CI – CS - CD on a web application ClubSoc which is a Club Management Application and using unsupervised machine learning algorithms to detect anomalies in the CI-CS-CD process. The Continuous Integration is implemented using Jenkins CI Tool, Continuous Security is implemented using Veracode Tool and Continuous Deployment is done using Docker and Jenkins. The results have shown by adapting this methodology, the author is able to improve the quality of the code, finding vulnerabilities using static scans, portability and saving time by automation in deployment of applications using Docker and Jenkins. Applying machine learning helps in predicting the defects, failures and trends in the Continuous Integration Pipeline, whereas it can help in predicting the business impact in Continuous Delivery. Unsupervised learning algorithms such as K-means Clustering, Symbolic Aggregate Approximation (SAX) and Markov are used for Quality and Performance Regression analysis in the CICD Model. Using the CICD model, the developers can fix the bugs pre-release and this will impact the company as a whole by raising the profit and attracting more customers. The Updated Application reaches the client faster using Continuous Deployment. By Analyzing the failure trends using Unsupervised machine learning, the developers might be able to predict where the next error is likely to happen and prevent it in the pre-build stage


Sign in / Sign up

Export Citation Format

Share Document