scholarly journals Discovery of complex oxides via automated experiments and data science

2021 ◽  
Vol 118 (37) ◽  
pp. e2106042118
Author(s):  
Lusann Yang ◽  
Joel A. Haber ◽  
Zan Armstrong ◽  
Samuel J. Yang ◽  
Kevin Kan ◽  
...  

The quest to identify materials with tailored properties is increasingly expanding into high-order composition spaces, with a corresponding combinatorial explosion in the number of candidate materials. A key challenge is to discover regions in composition space where materials have novel properties. Traditional predictive models for material properties are not accurate enough to guide the search. Herein, we use high-throughput measurements of optical properties to identify novel regions in three-cation metal oxide composition spaces by identifying compositions whose optical trends cannot be explained by simple phase mixtures. We screen 376,752 distinct compositions from 108 three-cation oxide systems based on the cation elements Mg, Fe, Co, Ni, Cu, Y, In, Sn, Ce, and Ta. Data models for candidate phase diagrams and three-cation compositions with emergent optical properties guide the discovery of materials with complex phase-dependent properties, as demonstrated by the discovery of a Co-Ta-Sn substitutional alloy oxide with tunable transparency, catalytic activity, and stability in strong acid electrolytes. These results required close coupling of data validation to experiment design to generate a reliable end-to-end high-throughput workflow for accelerating scientific discovery.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ali Rohani ◽  
Jennifer A. Kashatus ◽  
Dane T. Sessions ◽  
Salma Sharmin ◽  
David F. Kashatus

Abstract Mitochondria are highly dynamic organelles that can exhibit a wide range of morphologies. Mitochondrial morphology can differ significantly across cell types, reflecting different physiological needs, but can also change rapidly in response to stress or the activation of signaling pathways. Understanding both the cause and consequences of these morphological changes is critical to fully understanding how mitochondrial function contributes to both normal and pathological physiology. However, while robust and quantitative analysis of mitochondrial morphology has become increasingly accessible, there is a need for new tools to generate and analyze large data sets of mitochondrial images in high throughput. The generation of such datasets is critical to fully benefit from rapidly evolving methods in data science, such as neural networks, that have shown tremendous value in extracting novel biological insights and generating new hypotheses. Here we describe a set of three computational tools, Cell Catcher, Mito Catcher and MiA, that we have developed to extract extensive mitochondrial network data on a single-cell level from multi-cell fluorescence images. Cell Catcher automatically separates and isolates individual cells from multi-cell images; Mito Catcher uses the statistical distribution of pixel intensities across the mitochondrial network to detect and remove background noise from the cell and segment the mitochondrial network; MiA uses the binarized mitochondrial network to perform more than 100 mitochondria-level and cell-level morphometric measurements. To validate the utility of this set of tools, we generated a database of morphological features for 630 individual cells that encode 0, 1 or 2 alleles of the mitochondrial fission GTPase Drp1 and demonstrate that these mitochondrial data could be used to predict Drp1 genotype with 87% accuracy. Together, this suite of tools enables the high-throughput and automated collection of detailed and quantitative mitochondrial structural information at a single-cell level. Furthermore, the data generated with these tools, when combined with advanced data science approaches, can be used to generate novel biological insights.


JAMIA Open ◽  
2018 ◽  
Vol 1 (2) ◽  
pp. 136-141 ◽  
Author(s):  
Philip R O Payne ◽  
Elmer V Bernstam ◽  
Justin B Starren

Abstract There are an ever-increasing number of reports and commentaries that describe the challenges and opportunities associated with the use of big data and data science (DS) in the context of biomedical education, research, and practice. These publications argue that there are substantial benefits resulting from the use of data-centric approaches to solve complex biomedical problems, including an acceleration in the rate of scientific discovery, improved clinical decision making, and the ability to promote healthy behaviors at a population level. In addition, there is an aligned and emerging body of literature that describes the ethical, legal, and social issues that must be addressed to responsibly use big data in such contexts. At the same time, there has been growing recognition that the challenges and opportunities being attributed to the expansion in DS often parallel those experienced by the biomedical informatics community. Indeed, many informaticians would consider some of these issues relevant to the core theories and methods incumbent to the field of biomedical informatics science and practice. In response to this topic area, during the 2016 American College of Medical Informatics Winter Symposium, a series of presentations and focus group discussions intended to define the current state and identify future directions for interaction and collaboration between people who identify themselves as working on big data, DS, and biomedical informatics were conducted. We provide a perspective concerning these discussions and the outcomes of that meeting, and also present a set of recommendations that we have generated in response to a thematic analysis of those same outcomes. Ultimately, this report is intended to: (1) summarize the key issues currently being discussed by the biomedical informatics community as it seeks to better understand how to constructively interact with the emerging biomedical big data and DS fields; and (2) propose a framework and agenda that can serve to advance this type of constructive interaction, with mutual benefit accruing to both fields.


Molecules ◽  
2018 ◽  
Vol 23 (7) ◽  
pp. 1729
Author(s):  
Yinghan Hong ◽  
Zhifeng Hao ◽  
Guizhen Mai ◽  
Han Huang ◽  
Arun Kumar Sangaiah

Exploring and detecting the causal relations among variables have shown huge practical values in recent years, with numerous opportunities for scientific discovery, and have been commonly seen as the core of data science. Among all possible causal discovery methods, causal discovery based on a constraint approach could recover the causal structures from passive observational data in general cases, and had shown extensive prospects in numerous real world applications. However, when the graph was sufficiently large, it did not work well. To alleviate this problem, an improved causal structure learning algorithm named brain storm optimization (BSO), is presented in this paper, combining K2 with brain storm optimization (K2-BSO). Here BSO is used to search optimal topological order of nodes instead of graph space. This paper assumes that dataset is generated by conforming to a causal diagram in which each variable is generated from its parent based on a causal mechanism. We designed an elaborate distance function for clustering step in BSO according to the mechanism of K2. The graph space therefore was reduced to a smaller topological order space and the order space can be further reduced by an efficient clustering method. The experimental results on various real-world datasets showed our methods outperformed the traditional search and score methods and the state-of-the-art genetic algorithm-based methods.


2020 ◽  
Author(s):  
Emily Law ◽  
Brian Day ◽  

<p>NASA’s Solar System Treks program produces a suite of interactive visualization and AI/data science analysis tools. These tools enable mission planners, planetary scientists, and engineers to access geospatial data products derived from big data returned from a wide range of instruments aboard a variety of past and current missions, for a growing number of planetary bodies.</p><p>The portals provide easy-to-use tools for browse, search and the ability to overlay a growing range and large amount of value added data products. Data products can be viewed in 2D and 3D, in VR and can be easily integrated by stacking and blending together rendering optimal visualization. Data sets can be plotted and compared against each other. Standard gaming and 3D mouse controllers allow users to maneuver first-person visualizations of flying across planetary surfaces.</p><p>The portals provide a set of advanced analysis tools that employed AI and data science methods. The tools facilitate measurement and study of terrain including distance, height, and depth of surface features. They allow users to perform analyses such as lighting and local hazard assessments including slope, surface roughness and crater/boulder distribution, rockfall distribution, and surface electrostatic potential. These tools faciliate a wide range of activities including the planning, design, development, test and operations associated with lunar sortie missions; robotic (and potentially crewed) operations on the surface; planning tasks in the areas of landing site evaluation and selection; design and placement of landers and other stationary assets; design of rovers and other mobile assets; developing terrain-relative navigation (TRN) capabilities; deorbit/impact site visualization; and assessment and planning of science traverses. Additional tools useful scientific research are under development such as line of sight calculation.</p><p>Seven portals are publicly available to explore the Moon, Mars, Vesta, Ceres, Titan, IcyMoons, and Mercury with more portals in development and planning stages.</p><p>This presentation will provide an overview of the Solar System Treks and highlight its innovative visualization and analysis capabilities that advance scientific discovery.  The information system and science communities are invited to provide suggestions and requests as the development team continues to expand the portals’ tool suite to maximize scientific research.</p><p>Lastly, the authors would like to thank the Planetary Science Division of NASA’s Science Mission Directorate, NASA’s SMD Science Engagement and Partnerships, the Advanced Explorations Systems Program of NASA’s Human Exploration Operations Directorate, and the Moons to Mars Mission Directorate for their support and guidance in the development of the Solar System Treks.</p>


2015 ◽  
Vol 71 (5) ◽  
pp. 1059-1067 ◽  
Author(s):  
Markus-Frederik Bohn ◽  
Celia A. Schiffer

High-throughput crystallographic approaches require integrated software solutions to minimize the need for manual effort.REdiiiis a system that allows fully automated crystallographic structure solution by integrating existing crystallographic software into an adaptive and partly autonomous workflow engine. The program can be initiated after collecting the first frame of diffraction data and is able to perform processing, molecular-replacement phasing, chain tracing, ligand fitting and refinement without further user intervention. Preset values for each software component allow efficient progress with high-quality data and known parameters. The adaptive workflow engine can determine whether some parameters require modifications and choose alternative software strategies in case the preconfigured solution is inadequate. This integrated pipeline is targeted at providing a comprehensive and efficient approach to screening for ligand-bound co-crystal structures while minimizing repetitiveness and allowing a high-throughput scientific discovery process.


2021 ◽  
Author(s):  
Chaolemen Borjigin ◽  
Chen Zhang

Abstract Data Science is one of today’s most rapidly growing academic fields and has significant implications for all conventional scientific studies. However, most of the relevant studies so far have been limited to one or several facets of Data Science from a specific application domain perspective and fail to discuss its theoretical framework. Data Science is a novel science in that its research goals, perspectives, and body of knowledge is distinct from other sciences. The core theories of Data Science are the DIKW pyramid, data-intensive scientific discovery, data science lifecycle, data wrangling or munging, big data analytics, data management and governance, data products development, and big data visualization. Six main trends characterize the recent theoretical studies on Data Science: growing significance of DataOps, the rise of citizen data scientists, enabling augmented data science, diversity of domain-specific data science, and implementing data stories as data products. The further development of Data Science should prioritize four ways to turning challenges into opportunities: accelerating theoretical studies of data science, the trade-off between explainability and performance, achieving data ethics, privacy and trust, and aligning academic curricula to industrial needs.


2018 ◽  
Vol 22 (11) ◽  
pp. 5639-5656 ◽  
Author(s):  
Chaopeng Shen ◽  
Eric Laloy ◽  
Amin Elshorbagy ◽  
Adrian Albert ◽  
Jerad Bales ◽  
...  

Abstract. Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications and generating new and improved capabilities for scientific discovery and model building. The adoption of DL in hydrology has so far been gradual, but the field is now ripe for breakthroughs. This paper suggests that DL-based methods can open up a complementary avenue toward knowledge discovery in hydrologic sciences. In the new avenue, machine-learning algorithms present competing hypotheses that are consistent with data. Interrogative methods are then invoked to interpret DL models for scientists to further evaluate. However, hydrology presents many challenges for DL methods, such as data limitations, heterogeneity and co-evolution, and the general inexperience of the hydrologic field with DL. The roadmap toward DL-powered scientific advances will require the coordinated effort from a large community involving scientists and citizens. Integrating process-based models with DL models will help alleviate data limitations. The sharing of data and baseline models will improve the efficiency of the community as a whole. Open competitions could serve as the organizing events to greatly propel growth and nurture data science education in hydrology, which demands a grassroots collaboration. The area of hydrologic DL presents numerous research opportunities that could, in turn, stimulate advances in machine learning as well.


Sign in / Sign up

Export Citation Format

Share Document