Estimating the deep replicability of scientific findings using human and artificial intelligence

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

Download Full-text

Sweetspot mapping in deep brain stimulation: Strengths and limitations of current approaches

10.1101/2020.09.08.20190223 ◽

2020 ◽

Author(s):

Till A Dembek ◽

Carlos Baldermann ◽

Jan-Niklas Petry-Schmelzer ◽

Hannah Jergas ◽

Harald Treuer ◽

...

Keyword(s):

Deep Brain Stimulation ◽

Clinical Data ◽

Brain Stimulation ◽

In Silico ◽

Ground Truth ◽

Ground Truth Data ◽

Open Questions ◽

Out Of Sample ◽

Optimal Target ◽

Deep Brain

Objective: Open questions remain regarding the optimal target, or sweetspot, for deep brain stimulation (DBS) in e.g. Parkinson's Disease. Previous studies introduced different methods of mapping DBS effects to determine sweetspots. While having a direct impact on surgical targeting and postoperative programming in DBS, these methods so far have not been investigated in ground truth data. Materials & Methods: This study investigated five previously published DBS mapping methods regarding their potential to correctly identify a ground truth sweetspot. Methods were investigated in silico in eight different use case scenarios, which incorporated different types of clinical data, noise, and differences in underlying neuroanatomy. Dice coefficients were calculated to determine the overlap between identified sweetspots and the ground truth. Additionally, out of sample predictive capabilities were assessed using the amount of explained variance R-squared. Results: The five investigated methods resulted in highly variable sweetspots. Methods based on voxel-wise statistics against average outcomes showed the best performance overall. While predictive capabilities were high, even in the best of cases Dice coefficients remained limited to values around 0.5, highlighting the overall limitations of sweetspot identification. Conclusions: This study highlights the strengths and limitations of current approaches to DBS sweetspot mapping. Those limitations need to be taken into account when considering the clinical implications. All future approaches should be investigated in silico before being applied to clinical data.

Download Full-text

Knowledge Geometry in Phenomenon Perception and Artificial Intelligence

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.2020.032 ◽

2020 ◽

Vol 26 (5) ◽

pp. 604-623

Author(s):

João Gabriel Lopes De Oliveira ◽

Editorial office Pedro Moreira Menezes Da Costa ◽

Flavio De Mello

Keyword(s):

Artificial Intelligence ◽

Geometric Theory ◽

Ground Truth ◽

Machine Intelligence ◽

Case Based Reasoning ◽

Main Concern ◽

Formal Representation ◽

Matching Process ◽

Transportation Finance ◽

Casual Observer

Artificial Intelligence (AI) pervades industry, entertainment, transportation, finance, and health. It seems to be in a kind of golden age, but today AI is based on the strength of techniques that bear little relation to the thought mechanism. Contemporary techniques of machine learning, deep learning and case-based reasoning seem to be occupied with delivering functional and optimized solutions, leaving aside the core reasons of why such solutions work. This paper, in turn, proposes a theoretical study of perception, a key issue for knowledge acquisition and intelligence construction. Its main concern is the formal representation of a perceived phenomenon by a casual observer and its relationship with machine intelligence. This work is based on recently proposed geometric theory, and represents an approach that is able to describe the inuence of scope, development paradigms, matching process and ground truth on phenomenon perception. As a result, it enumerates the perception variables and describes the implications for AI.

Download Full-text

Underwater Image Enhancement with the Low-Rank Nonnegative Matrix Factorization Method

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421540227 ◽

2021 ◽

pp. 2154022

Author(s):

Xiaopeng Liu ◽

Cong Liu ◽

Xiaochen Liu

Keyword(s):

Image Enhancement ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Ground Truth ◽

Factorization Method ◽

Low Rank ◽

Ground Truth Data ◽

Underwater Image ◽

Better Than

Due to the scattering and absorption effects in the undersea environment, underwater image enhancement is a challenging problem. To obtain the ground-truth data for training is also an open problem. So, the learning process is unavailable. In this paper, we propose a Low-Rank Nonnegative Matrix Factorization (LR-NMF) method, which only uses the degraded underwater image as input to generate the more clear and realistic image. According to the underwater image formation model, the degraded underwater image could be separated into three parts, the directed component, the back and forward scattering components. The latter two parts can be considered as scattering. The directed component is constrained to have a low rank. After that, the restored underwater image is obtained. The quantitative and qualitative analyses illustrate that the proposed method performed equivalent or better than the state-of-the-art methods. Yet, it’s simple to implement without the training process.

Download Full-text

Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos

AI ◽

10.3390/ai2040041 ◽

2021 ◽

Vol 2 (4) ◽

pp. 684-704

Author(s):

Karen Panetta ◽

Landry Kezebou ◽

Victor Oludare ◽

James Intriligator ◽

Sos Agaian

Keyword(s):

Artificial Intelligence ◽

Intelligent Transportation Systems ◽

Large Scale ◽

Image Data ◽

Ground Truth ◽

Quantitative Model ◽

Transportation Systems ◽

Ground Truth Data ◽

Vehicle Recognition ◽

And Performance

The concept of searching and localizing vehicles from live traffic videos based on descriptive textual input has yet to be explored in the scholarly literature. Endowing Intelligent Transportation Systems (ITS) with such a capability could help solve crimes on roadways. One major impediment to the advancement of fine-grain vehicle recognition models is the lack of video testbench datasets with annotated ground truth data. Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos and even less so for vehicle search and localization models. In this paper, we address these challenges by proposing V-Localize, a novel artificial intelligence framework for vehicle search and continuous localization captured from live traffic videos based on input textual descriptions. An efficient hashgraph algorithm is introduced to compute valid target information from textual input. This work further introduces two novel datasets to advance AI research in these challenging areas. These datasets include (a) the most diverse and large-scale Vehicle Color Recognition (VCoR) dataset with 15 color classes—twice as many as the number of color classes in the largest existing such dataset—to facilitate finer-grain recognition with color information; and (b) a Vehicle Recognition in Video (VRiV) dataset, a first of its kind video testbench dataset for evaluating the performance of vehicle recognition models in live videos rather than still image data. The VRiV dataset will open new avenues for AI researchers to investigate innovative approaches that were previously intractable due to the lack of annotated traffic vehicle recognition video testbench dataset. Finally, to address the gap in the field, five novel metrics are introduced in this paper for adequately accessing the performance of vehicle recognition models in live videos. Ultimately, the proposed metrics could also prove intuitively effective at quantitative model evaluation in other video recognition applications. T One major advantage of the proposed vehicle search and continuous localization framework is that it could be integrated in ITS software solution to aid law enforcement, especially in critical cases such as of amber alerts or hit-and-run incidents.

Download Full-text

A Robust Hand Silhouette Orientation Detection Method for Hand Gesture Recognition

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.56.3.4 ◽

2021 ◽

Vol 56 (3) ◽

pp. 43-52

Author(s):

Andi Sunyoto

Keyword(s):

Gesture Recognition ◽

Ground Truth ◽

Hand Gesture Recognition ◽

Hand Gesture ◽

Ground Truth Data ◽

Ellipse Method ◽

Orientation Detection ◽

The Difference ◽

Image Orientation ◽

Better Than

The computer vision approach is most widely used for research related to hand gesture recognition. The detection of the image orientation has been discovered to be one of the keys to determine its success. The degree of freedom for a hand determines the shape and orientation of a gesture, which further causes a problem in the recognition methods. This article proposed evaluating orientation detection for silhouette static hand gestures with different poses and orientations without considering the forearm. The longest chord and ellipse were the two popular methods compared. The angles formed from two wrist points were selected as ground truth data and calculated from the horizontal axis. The performance was analyzed using the error values obtained from the difference in ground truth data angles compared to the method's results. The method has errors closer to zero that were rated better. Moreover, the method was evaluated using 1187 images, divided into four groups based on the forearm presence, and the results showed its effect on orientation detection. It was also discovered that the ellipse method was better than the longest chord. This study's results are used to select hand gesture orientation detection to increase accuracy in the hand gesture recognition process.

Download Full-text

DOP13 Artificial Intelligence (AI) in endoscopy - Deep learning for detection and scoring of Ulcerative Colitis (UC) disease activity under multiple scoring systems

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab073.052 ◽

2021 ◽

Vol 15 (Supplement_1) ◽

pp. S051-S052

Author(s):

M Byrne ◽

J East ◽

M Iacucci ◽

R Panaccione ◽

R Kalapala ◽

...

Keyword(s):

Artificial Intelligence ◽

Ulcerative Colitis ◽

Deep Learning ◽

Disease Activity ◽

Tissue Characterization ◽

Scoring Systems ◽

Ground Truth ◽

The Body ◽

Graphic User Interface ◽

Ground Truth Data

Abstract Background Computer vision & deep learning(DL)to assess & help with tissue characterization of disease activity in Ulcerative Colitis(UC)through Mayo Endoscopic Subscore(MES)show good results in central reading for clinical trials.UCEIS(Ulcerative Colitis Endoscopic Index of Severity)being a granular index,may be more reflective of disease activity & more primed for artificial intelligence(AI). We set out to create UC detection & scoring,in a single tool & graphic user interface(GUI),improving accuracy & precision of MES & UCEIS scores & reducing the time elapsed between video collection,quality assurance & final scoring.We apply DL models to detect & filter scorable frames,assess quality of endoscopic recordings & predict MES & UCEIS scores in videos of patients with UC Methods We leveraged>375,000frames from endoscopy cases using Olympus scopes(190&180Series).Experienced endoscopists & 9 labellers tagged~22,000(6%)images showing normal, disease state(MES orUCEIS subscores)& non-scorable frames.We separate total frames in 3 categories:training(60%),testing(20%)&validation(20%).Using a Convolutional Neural Network(CNN)Inception V3,including a biopsy & post-biopsy detector,an out-of-the-body framework & blue light algorithm.Similar architecture for detection with multiple separate units & corresponding dense layers taking CNN to provide continuous scores for 5 separate outputs:MES,aggregate UCEIS & individual components Vascular Pattern,Bleeding & Ulcers. Results Multiple metrics evaluate detection models.Overall performance has an accuracy of~88% & a similar precision & recall for all classes. MAE(distance from ground truth)& mean bias(over/under-prediction tendency)are used to assess the performance of the scoring model.Our model performs well as predicted distributions are relatively close to the labelled,ground truth data & MAE & Bias for all frames are relatively low considering the magnitude of the scoring scale. To leverage all our models,we developed a practical tool that should be used to improve efficiency & accuracy of reading & scoring process for UC at different stages of the clinical journey. Conclusion We propose a DL approach based on labelled images to automate a workflow for improving & accelerating UC disease detection & scoring using MES & UCEIS scores. Our deep learning model shows relevant feature identification for scoring disease activity in UC patients, well aligned with both scoring guidelines,performance of experts & demonstrates strong promise for generalization.Going forward, we aim to continue developing our detection & scoring tool. With our detailed workflow supported by deep learning models, we have a driving function to create a precise & potentially superhuman level AI to score disease activity

Download Full-text

The Future of Human Workers

10.1093/oso/9780198827481.003.0018 ◽

2018 ◽

Author(s):

Mahesh K. Joshi ◽

J.R. Klein

Keyword(s):

Artificial Intelligence ◽

New Technologies ◽

Mental Capacity ◽

Machine Intelligence ◽

The Internet ◽

Traditional Model ◽

Global Workforce ◽

Repetitive Tasks ◽

The Internet Of Things

New technologies like artificial intelligence, robotics, machine intelligence, and the Internet of Things are seeing repetitive tasks move away from humans to machines. Humans cannot become machines, but machines can become more human-like. The traditional model of educating workers for the workforce is fast becoming irrelevant. There is a massive need for the retooling of human workers. Humans need to be trained to remain focused in a society which is constantly getting bombarded with information. The two basic elements of physical and mental capacity are slowly being taken over by machines and artificial intelligence. This changes the fundamental role of the global workforce.

Download Full-text

The Future of Work and the Changing Workplace

10.1093/oso/9780198827481.003.0017 ◽

2018 ◽

Author(s):

Mahesh K. Joshi ◽

J.R. Klein

Keyword(s):

Artificial Intelligence ◽

Machine Intelligence ◽

Political Issue ◽

Digital Innovation ◽

Future Of Work ◽

The Past ◽

The World ◽

Public Debates ◽

The Future ◽

Market Opportunities

The world of work has been impacted by technology. Work is different than it was in the past due to digital innovation. Labor market opportunities are becoming polarized between high-end and low-end skilled jobs. Migration and its effects on employment have become a sensitive political issue. From Buffalo to Beijing public debates are raging about the future of work. Developments like artificial intelligence and machine intelligence are contributing to productivity, efficiency, safety, and convenience but are also having an impact on jobs, skills, wages, and the nature of work. The “undiscovered country” of the workplace today is the combination of the changing landscape of work itself and the availability of ill-fitting tools, platforms, and knowledge to train for the requirements, skills, and structure of this new age.

Download Full-text

Computing Possible Futures

10.1093/oso/9780198846420.001.0001 ◽

2019 ◽

Cited By ~ 2

Author(s):

William B. Rouse

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computational Modeling ◽

Mental Model ◽

Data Analytics ◽

Computational Models ◽

Senior Managers ◽

Interactive Visualizations ◽

Use Of Models ◽

Better Than

This book discusses the use of models and interactive visualizations to explore designs of systems and policies in determining whether such designs would be effective. Executives and senior managers are very interested in what “data analytics” can do for them and, quite recently, what the prospects are for artificial intelligence and machine learning. They want to understand and then invest wisely. They are reasonably skeptical, having experienced overselling and under-delivery. They ask about reasonable and realistic expectations. Their concern is with the futurity of decisions they are currently entertaining. They cannot fully address this concern empirically. Thus, they need some way to make predictions. The problem is that one rarely can predict exactly what will happen, only what might happen. To overcome this limitation, executives can be provided predictions of possible futures and the conditions under which each scenario is likely to emerge. Models can help them to understand these possible futures. Most executives find such candor refreshing, perhaps even liberating. Their job becomes one of imagining and designing a portfolio of possible futures, assisted by interactive computational models. Understanding and managing uncertainty is central to their job. Indeed, doing this better than competitors is a hallmark of success. This book is intended to help them understand what fundamentally needs to be done, why it needs to be done, and how to do it. The hope is that readers will discuss this book and develop a “shared mental model” of computational modeling in the process, which will greatly enhance their chances of success.

Download Full-text

Assessing Wildfire Burn Severity and Its Relationship with Environmental Factors: A Case Study in Interior Alaska Boreal Forest

Remote Sensing ◽

10.3390/rs13101966 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1966

Author(s):

Christopher W Smith ◽

Santosh K Panda ◽

Uma S Bhatt ◽

Franz J Meyer ◽

Anushree Badola ◽

...

Keyword(s):

Boreal Forest ◽

Ground Truth ◽

Burn Severity ◽

Classification Methods ◽

Spectral Indices ◽

Ground Truth Data ◽

Burn Scar ◽

Interior Alaska ◽

Remote Sensing Methods ◽

The Relationship

In recent years, there have been rapid improvements in both remote sensing methods and satellite image availability that have the potential to massively improve burn severity assessments of the Alaskan boreal forest. In this study, we utilized recent pre- and post-fire Sentinel-2 satellite imagery of the 2019 Nugget Creek and Shovel Creek burn scars located in Interior Alaska to both assess burn severity across the burn scars and test the effectiveness of several remote sensing methods for generating accurate map products: Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR), and Random Forest (RF) and Support Vector Machine (SVM) supervised classification. We used 52 Composite Burn Index (CBI) plots from the Shovel Creek burn scar and 28 from the Nugget Creek burn scar for training classifiers and product validation. For the Shovel Creek burn scar, the RF and SVM machine learning (ML) classification methods outperformed the traditional spectral indices that use linear regression to separate burn severity classes (RF and SVM accuracy, 83.33%, versus NBR accuracy, 73.08%). However, for the Nugget Creek burn scar, the NDVI product (accuracy: 96%) outperformed the other indices and ML classifiers. In this study, we demonstrated that when sufficient ground truth data is available, the ML classifiers can be very effective for reliable mapping of burn severity in the Alaskan boreal forest. Since the performance of ML classifiers are dependent on the quantity of ground truth data, when sufficient ground truth data is available, the ML classification methods would be better at assessing burn severity, whereas with limited ground truth data the traditional spectral indices would be better suited. We also looked at the relationship between burn severity, fuel type, and topography (aspect and slope) and found that the relationship is site-dependent.

Download Full-text