The Digital Extended Specimen will Enable New Science and Applications

Specimens have long been viewed as critical to research in the natural sciences because each specimen captures the phenotype (and often the genotype) of a particular individual at a particular point in space and time. In recent years there has been considerable focus on digitizing the many physical specimens currently in the world’s natural history research collections. As a result, a growing number of specimens are each now represented by their own “digital specimen”, that is, a findable, accessible, interoperable and re-usable (FAIR) digital representation of the physical specimen, which contains data about it. At the same time, there has been growing recognition that each digital specimen can be extended, and made more valuable for research, by linking it to data/samples derived from the curated physical specimen itself (e.g., computed tomography (CT) scan imagery, DNA sequences or tissue samples), directly related specimens or data about the organism's life (e.g., specimens of parasites collected from it, photos or recordings of the organism in life, immediate surrounding ecological community), and the wide range of associated specimen-independent data sets and model-based contextualisations (e.g., taxonomic information, conservation status, bioclimatological region, remote sensing images, environmental-climatological data, traditional knowledge, genome annotations). The resulting connected network of extended digital specimens will enable new research on a number of fronts, and indeed this has already begun. The new types of research enabled fall into four distinct but overlapping categories. First, because the digital specimen is a surrogate—acting on the Internet for a physical specimen in a natural science collection—it is amenable to analytical approaches that are simply not possible with physical specimens. For example, digital specimens can serve as training, validation and test sets for predictive process-based or machine learning algorithms, which are opening new doors of discovery and forecasting. Such sophisticated and powerful analytical approaches depend on FAIR, and on extended digital specimen data being as open as possible. These analytical approaches are derived from biodiversity monitoring outputs that are critically needed by the biodiversity community because they are central to conservation efforts at all levels of analysis, from genetics to species to ecosystem diversity. Second, linking specimens to closely associated specimens (potentially across multiple disparate collections) allows for the coordinated co-analysis of those specimens. For example, linking specimens of parasites/pathogens to specimens of the hosts from which they were collected, allows for a powerful new understanding of coevolution, including pathogen range expansion and shifts to new hosts. Similarly, linking specimens of pollinators, their food plants, and their predators can help untangle complex food webs and multi-trophic interactions. Third, linking derived data to their associated voucher specimens increases information richness, density, and robustness, thereby allowing for novel types of analyses, strengthening validation through linked independent data and thus, improving confidence levels and risk assessment. For example, digital representations of specimens, which incorporate e.g., images, CT scans, or vocalizations, may capture important information that otherwise is lost during preservation, such as coloration or behavior. In addition, permanently linking genetic and genomic data to the specimen of the individual from which they were derived—something that is currently done inconsistently—allows for detailed studies of the connections between genotype and phenotype. Furthermore, persistent links to physical specimens, of additional information and associated transactions, are the building blocks of documentation and preservation of chains of custody. The links will also facilitate data cleaning, updating, as well as maintenance of digital specimens and their derived and associated datasets, with ever-expanding research questions and applied uses materializing over time. The resulting high-quality data resources are needed for fact-based decision-making and forecasting based on monitoring, forensics and prediction workflows in conservation, sustainable management and policy-making. Finally, linking specimens to diverse but associated datasets allows for detailed, often transdisciplinary, studies of topics ranging from local adaptation, through the forces driving range expansion and contraction (critically important to our understanding of the consequences of climate change), and social vectors in disease transmission. A network of extended digital specimens will enable new and critically important research and applications in all of these categories, as well as science and uses that we cannot yet envision.

Download Full-text

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation

10.1101/2021.05.30.21257945 ◽

2021 ◽

Author(s):

Heekyong Park ◽

Taowei David Wang ◽

Nich Wattanasin ◽

Victor M. Castro ◽

Vivian Gainer ◽

...

Keyword(s):

Chart Review ◽

Relevant Information ◽

Machine Learning Algorithms ◽

Quality Data ◽

Patient Specific ◽

Privacy And Security ◽

Data Mart ◽

Clinical Indicators ◽

Security Issues ◽

Wide Range

Objective: To provide high-quality data for COVID-19 research, we validated COVID-19 clinical indicators and 22 associated computed phenotypes, which were derived by machine learning algorithms, in the Mass General Brigham (MGB) COVID-19 Data Mart. Materials and Methods: Fifteen reviewers performed a manual chart review for 150 COVID-19 positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered the Digital Analytic Patient Reviewer (DAPR). DAPR is a web-based chart review tool that integrates patient notes and provides note search functionalities and a patient-specific summary view linked with relevant notes. Within DAPR, we developed a COVID-19 validation task-oriented view and information extraction logic, enabled fast access to data, and considered privacy and security issues. Results: The concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related admission, and the admission date were shown to have high values in all evaluation metrics. For phenotypes, the overall specificities, PPVs, and NPVs were high. However, sensitivities were relatively low. Based on these results, we removed 3 phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes towards using DAPR for chart review. They assessed the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed. Discussion and Conclusion: DAPR's patient summary view accelerated the validation process. We are in the process of automating the workflow to use DAPR for chart reviews. Moreover, we will extend its use case to other domains.

Download Full-text

From Single Nanowires to Smart Systems: Different Ways to Assess Food Quality

Chemistry Proceedings ◽

10.3390/csac2021-10605 ◽

2021 ◽

Vol 5 (1) ◽

pp. 29

Author(s):

Matteo Tonezzer ◽

Franco Biasioli ◽

Flavia Gasperi

Keyword(s):

Food Quality ◽

Intelligent System ◽

Limit Of Detection ◽

Food Poisoning ◽

Building Blocks ◽

Viable Count ◽

Machine Learning Algorithms ◽

Smart Systems ◽

Wide Range ◽

Fish Samples

Recently, low-dimensional (1D, 2D) nanostructured materials have been attracting more and more interest as building blocks for innovative systems. Metal oxide nanowires are one of the most widely used materials for solid-state gas sensors, as they are simple to make, inexpensive, and sensitive to a wide range of gases and volatiles. Unfortunately, their broad sensitivity has a price to pay, which is very low selectivity. Fortunately, this flaw is not a problem for all applications. Where the boundary conditions are defined and “simple” (only the presence of a target gas is expected, without any interfering gases), a single traditional chemiresistor may be the best choice, while in cases where the variables are many, it is better to use an intelligent system. In this paper, we will show a resistive sensor based on a single SnO2 nanowire which, working at three temperatures (200, 250, and 300 °C), is able to detect tens of ppb of ammonia (30 ppb at 300 °C). The limit of detection (LoD) was calculated as 3 N/S, where N is the standard deviation of the sensor signal in air and S is the sensor sensitivity. We will show that the performance of this nanosensor is excellent and can be used in various applications, including agri-food quality monitoring. We will demonstrate that the SnO2 nanowire in a thermal gradient can act as a nano-electronic nose thanks to machine learning algorithms. The single nanowire-based sensor can estimate the total viable count with an error of 2.32% on mackerel fish samples stored at room temperature (25 °C) and in a fridge (4 °C). The integration of such a small (less than one square mm) and cheap device into the food supply chain would greatly reduce waste and the frequency of food poisoning.

Download Full-text

Factors Affecting Molecular Self-Assembly and Its Mechanism

Scientific Research Journal ◽

10.24191/srj.v9i1.5385 ◽

2012 ◽

Vol 9 (1) ◽

pp. 43 ◽

Cited By ~ 1

Author(s):

Hueyling Tan

Keyword(s):

Self Assembly ◽

Building Blocks ◽

Molecular Engineering ◽

Molecular Structures ◽

Polymer Science ◽

New Approach ◽

Factors Affecting ◽

Dna Structures ◽

Self Assembling ◽

Wide Range

Molecular self-assembly is ubiquitous in nature and has emerged as a new approach to produce new materials in chemistry, engineering, nanotechnology, polymer science and materials. Molecular self-assembly has been attracting increasing interest from the scientific community in recent years due to its importance in understanding biology and a variety of diseases at the molecular level. In the last few years, considerable advances have been made in the use ofpeptides as building blocks to produce biological materials for wide range of applications, including fabricating novel supra-molecular structures and scaffolding for tissue repair. The study ofbiological self-assembly systems represents a significant advancement in molecular engineering and is a rapidly growing scientific and engineering field that crosses the boundaries ofexisting disciplines. Many self-assembling systems are rangefrom bi- andtri-block copolymers to DNA structures as well as simple and complex proteins andpeptides. The ultimate goal is to harness molecular self-assembly such that design andcontrol ofbottom-up processes is achieved thereby enabling exploitation of structures developed at the meso- and macro-scopic scale for the purposes oflife and non-life science applications. Such aspirations can be achievedthrough understanding thefundamental principles behind the selforganisation and self-synthesis processes exhibited by biological systems.

Download Full-text

Stereospecific, Palladium-catalyzed C(sp3)–H Alkenylation and Alkynylation of a Proline Derivative Enabled by 8-Aminoquinoline as a Directing Group

10.26434/chemrxiv.12034743 ◽

2020 ◽

Author(s):

Aleksandra Balliu ◽

Aaltje Roelofje Femmigje Strijker ◽

Michael Oschmann ◽

Monireh Pourghasemi Lati ◽

Oscar Verho

Keyword(s):

Carboxylic Acid ◽

Building Blocks ◽

Wide Range ◽

Palladium Catalyzed ◽

Directing Group ◽

High Yields ◽

Vinyl Iodides ◽

Initial Results

In this preprint, we present our initial results concerning a stereospecific Pd-catalyzed protocol for the C3 alkenylation and alkynylation of a proline derivative carrying the well utilized 8‑aminoquinoline directing group. Efficient C–H alkenylation was achieved with a wide range of vinyl iodides bearing different aliphatic, aromatic and heteroaromatic substituents, to furnish the corresponding C3 alkenylated products in good to high yields. In addition, we were able show that this protocol can also be used to install an alkynyl group into the pyrrolidine scaffold, when a TIPS-protected alkynyl bromide was used as the reaction partner. Furthermore, two different methods for the removal of the 8-aminoquinoline auxiliary are reported, which can enable access to both cis- and trans-configured carboxylic acid building blocks from the C–H alkenylation products.

Download Full-text

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Machine Learning Techniques ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.

Download Full-text

Application of Nitrogen Heterocyclic Carbenes in Organocatalysis

Current Catalysis ◽

10.2174/2211544709999201201120617 ◽

2020 ◽

Vol 09 ◽

Author(s):

Minita Ojha ◽

R. K. Bansal

Keyword(s):

Asymmetric Catalysis ◽

Building Blocks ◽

Structural Features ◽

Heterocyclic Carbenes ◽

Stetter Reaction ◽

Metal Pollutants ◽

Synthetic Strategy ◽

Wide Range ◽

Benzoin Condensation ◽

Nitrogen Heterocyclic

Background: During the last two decades, horizon of research in the field of Nitrogen Heterocyclic Carbenes (NHC) has widened remarkably. NHCs have emerged as ubiquitous species having applications in a broad range of fields, including organocatalysis and organometallic chemistry. The NHC-induced non-asymmetric catalysis has turned out to be a really fruitful area of research in recent years. Methods: By manipulating structural features and selecting appropriate substituent groups, it has been possible to control the kinetic and thermodynamic stability of a wide range of NHCs, which can be tolerant to a variety of functional groups and can be used under mild conditions. NHCs are produced by different methods, such as deprotonation of Nalkylhetrocyclic salt, transmetallation, decarboxylation and electrochemical reduction. Results: The NHCs have been used successfully as catalysts for a wide range of reactions making a large number of building blocks and other useful compounds accessible. Some of these reactions are: benzoin condensation, Stetter reaction, Michael reaction, esterification, activation of esters, activation of isocyanides, polymerization, different cycloaddition reactions, isomerization, etc. The present review includes all these examples published during the last 10 years, i.e. from 2010 till date. Conclusion: The NHCs have emerged as versatile and powerful organocatalysts in synthetic organic chemistry. They provide the synthetic strategy which does not burden the environment with metal pollutants and thus fit in the Green Chemistry.

Download Full-text

RepPer: Perception of Psychiatric Disorders on Twitter in French (Preprint)

10.2196/preprints.18539 ◽

2020 ◽

Author(s):

Sarah Delanys ◽

Farah Benamara ◽

Véronique Moriceau ◽

François Olivier ◽

Josiane Mothe

Keyword(s):

Social Media ◽

Psychiatric Disorders ◽

Digital Technology ◽

Psychotic Disorders ◽

Negative Polarity ◽

Machine Learning Algorithms ◽

Annotation Scheme ◽

Word Use ◽

Wide Range ◽

Initial Dataset

BACKGROUND With the advent of digital technology and specifically user generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French. OBJECTIVE Our aim is to study how generic, nosographic and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has three complementary goals: (1) to analyze the types of psychiatric word use namely medical, misuse, irrelevant, (2) to analyze the polarity conveyed in the tweets that use these terms (positive/negative/neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English ). METHODS Our study has been conducted on a corpus of tweets in French posted between 01/01/2016 to 12/31/2018 and collected using dedicated keywords. The corpus has been manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. Two analysis have been performed. First a qualitative analysis to measure the reliability of the produced manual annotation, then a quantitative analysis considering mainly term frequency in each layer and exploring the interactions between them. RESULTS One of the first result is a resource as an annotated dataset . The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3,040 tweets that corresponds to the resource resulting from our work. The second result is the analysis of the annotations; it shows that terms are misused in 45.3% of the tweets and that their associated polarity is negative in 86.2% of the cases. When considering the three types of term use, 59.5% of the tweets are associated to a negative polarity. Misused terms related to psychotic disorders (55.5%) are more frequent to those related to mood disorders (26.5%). CONCLUSIONS Some psychiatric terms are misused in the corpora we studied; which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks which are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced; so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.

Download Full-text

MXenes for future nanophotonic device applications

Nanophotonics ◽

10.1515/nanoph-2020-0060 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1831-1853

Author(s):

Jaeho Jeon ◽

Yajie Yang ◽

Haeju Choi ◽

Jin-Hong Park ◽

Byoung Hun Lee ◽

...

Keyword(s):

Solar Cells ◽

Transition Metal ◽

Wide Spectrum ◽

Building Blocks ◽

Optoelectronic Device ◽

High Yield ◽

Saturable Absorption ◽

Nanophotonic Device ◽

Wide Range ◽

Device Applications

AbstractTwo-dimensional (2D) layers of transition metal carbides, nitrides, or carbonitrides, collectively referred to as MXenes, are considered as the new family of 2D materials for the development of functional building blocks for optoelectronic and photonic device applications. Their advantages are based on their unique and tunable electronic and optical properties, which depend on the modulation of transition metal elements or surface functional groups. In this paper, we have presented a comprehensive review of MXenes to suggest an insightful perspective on future nanophotonic and optoelectronic device applications based on advanced synthesis processes and theoretically predicted or experimentally verified material properties. Recently developed optoelectronic and photonic devices, such as photodetectors, solar cells, fiber lasers, and light-emitting diodes are summarized in this review. Wide-spectrum photodetection with high photoresponsivity, high-yield solar cells, and effective saturable absorption were achieved by exploiting different MXenes. Further, the great potential of MXenes as an electrode material is predicted with a controllable work function in a wide range (1.6–8 eV) and high conductivity (~104 S/cm), and their potential as active channel material by generating a tunable energy bandgap is likewise shown. MXene can provide new functional building blocks for future generation nanophotonic device applications.

Download Full-text

IIMLP: integrated information-entropy-based method for LncRNA prediction

BMC Bioinformatics ◽

10.1186/s12859-020-03884-w ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Junyi Li ◽

Huinian Li ◽

Xiao Ye ◽

Li Zhang ◽

Qingzhe Xu ◽

...

Keyword(s):

Machine Learning ◽

Dna Sequences ◽

Information Entropy ◽

Area Under The Curve ◽

Prediction Method ◽

Machine Learning Algorithms ◽

Reading Frame ◽

Non Coding Rna ◽

The One ◽

Long Non Coding Rna

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.

Download Full-text

Efficient Secure Building Blocks with Application to Privacy Preserving Machine Learning Algorithms

IEEE Access ◽

10.1109/access.2021.3049216 ◽

2021 ◽

pp. 1-1

Author(s):

Artrim Kjamilji ◽

Erkay Savas ◽

Albert Levi

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Building Blocks ◽

Privacy Preserving ◽

Machine Learning Algorithms

Download Full-text