Predicting species richness and abundance of tropical post-larval fish using machine learning

Post-larval prediction is important, as post-larval supply allows us to understand juvenile fish populations. No previous studies have predicted post-larval fish species richness and abundance combining molecular tools, machine learning, and past-days remotely sensed oceanic conditions (RSOCs) obtained in the days just prior to sampling at different scales. Previous studies aimed at modeling species richness and abundance of marine fishes have mainly used environmental variables recorded locally during sampling and have merely focused on juvenile and adult fishes due to the difficulty of obtaining accurate species richness estimates for post-larvae. The present work predicted post-larval species richness (identified using DNA barcoding) and abundance at 2 coastal sites in SW Madagascar using random forest (RF) models. RFs were fitted using combinations of local variables and RSOCs at a small-scale (8 d prior to fish sampling in a 50 × 120 km2 area), meso-scale (16 d prior; 100 × 200 km2), and large-scale (24 d prior; 200 × 300 km2). RF models combining local and small-scale RSOC variables predicted species richness and abundance best, with accuracy around 70 and 60%, respectively. We observed a small variation of RF model performance in predicting species richness and abundance among all sites, highlighting the consistency of the predictive RF model. Moreover, partial dependence plots showed that high species richness and abundance were predicted for sea surface temperatures <27.0°C and chlorophyll a concentrations <0.22 mg m-3. With respect to temporal changes, these thresholds were solely observed from November to December. Our results suggest that, in SW Madagascar, species richness and abundance of post-larval fish may only be predicted prior to the ecological impacts of tropical storms on larval settlement success.

Download Full-text

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Applied Sciences ◽

10.3390/app11020472 ◽

2021 ◽

Vol 11 (2) ◽

pp. 472

Author(s):

Hyeongmin Cho ◽

Sangkyun Lee

Keyword(s):

Machine Learning ◽

Data Quality ◽

Large Scale ◽

High Dimensional Data ◽

Quality Measures ◽

Training Data ◽

Measure Data ◽

High Dimensional ◽

Small Scale ◽

Class Separability

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Effect of indaziflam on native species in natural areas and rangeland

Invasive Plant Science and Management ◽

10.1017/inp.2019.4 ◽

2019 ◽

Vol 12 (1) ◽

pp. 60-67 ◽

Cited By ~ 2

Author(s):

Shannon L. Clark ◽

Derek J. Sebastian ◽

Scott J. Nissen ◽

James R. Sebastian

Keyword(s):

Species Richness ◽

Weed Management ◽

Native Species ◽

Species Abundance ◽

Unintended Consequences ◽

Ecological Impacts ◽

Limited Information ◽

Perennial Species ◽

Temporary Reduction ◽

Species Richness And Abundance

AbstractMinimizing the negative ecological impacts of exotic plant invasions is one goal of land management. Using selective herbicides is one strategy to achieve this goal; however, the unintended consequences of this strategy are not always fully understood. The recently introduced herbicide indaziflam has a mode of action not previously used in non-crop weed management. Thus, there is limited information about the impacts of this active ingredient when applied alone or in combination with other non-crop herbicides. The objective of this research was to evaluate native species tolerance to indaziflam and imazapic applied alone and with other broadleaf herbicides. Replicated field plots were established at two locations in Colorado with a diverse mix of native forbs and grasses. Species richness and abundance were compared between the nontreated control plots and plots where indaziflam and imazapic were applied alone and in combination with picloram and aminocyclopyrachlor. Species richness and abundance did not decrease when indaziflam or imazapic were applied alone; however, species abundance was reduced by treatments containing picloram and aminocyclopyrachlor. Species richness was only impacted at one site 1 yr after treatment (YAT) by these broadleaf herbicides. Decreases in abundance were mainly due to reductions in forbs that resulted in a corresponding increase in grass cover. Our data suggest that indaziflam will control downy brome (Bromus tectorumL.) for multiple years without reduction in perennial species richness or abundance. IfB. tectorumis present with perennial broadleaf weeds requiring the addition of herbicides like picloram or aminocyclopyrachlor, forb abundance could be reduced, and in some cases there could be a temporary reduction in perennial species richness.

Download Full-text

Testing biodiversity theory using species richness of reef-building corals across a depth gradient

Biology Letters ◽

10.1098/rsbl.2019.0493 ◽

2019 ◽

Vol 15 (10) ◽

pp. 20190493 ◽

Cited By ~ 2

Author(s):

T. Edward Roberts ◽

Sally A. Keith ◽

Carsten Rahbek ◽

Tom C. L. Bridge ◽

M. Julian Caley ◽

...

Keyword(s):

Species Richness ◽

Large Scale ◽

Coral Species ◽

Environmental Gradients ◽

Abiotic Factors ◽

Empirical Support ◽

Null Model ◽

Local Scale ◽

Small Scale ◽

Energy Availability

Natural environmental gradients encompass systematic variation in abiotic factors that can be exploited to test competing explanations of biodiversity patterns. The species–energy (SE) hypothesis attempts to explain species richness gradients as a function of energy availability. However, limited empirical support for SE is often attributed to idiosyncratic, local-scale processes distorting the underlying SE relationship. Meanwhile, studies are also often confounded by factors such as sampling biases, dispersal boundaries and unclear definitions of energy availability. Here, we used spatially structured observations of 8460 colonies of photo-symbiotic reef-building corals and a null-model to test whether energy can explain observed coral species richness over depth. Species richness was left-skewed, hump-shaped and unrelated to energy availability. While local-scale processes were evident, their influence on species richness was insufficient to reconcile observations with model predictions. Therefore, energy availability, either in isolation or in combination with local deterministic processes, was unable to explain coral species richness across depth. Our results demonstrate that local-scale processes do not necessarily explain deviations in species richness from theoretical models, and that the use of idiosyncratic small-scale factors to explain large-scale ecological patterns requires the utmost caution.

Download Full-text

Distinctive patterns and signals at major environmental events and collapse zone boundaries

Environmental Monitoring and Assessment ◽

10.1007/s10661-021-09463-7 ◽

2021 ◽

Vol 193 (10) ◽

Author(s):

Melinda Pálinkás ◽

Levente Hufnagel

Keyword(s):

Climate Change ◽

Cluster Analysis ◽

Species Richness ◽

Relative Abundance ◽

Large Scale ◽

Hierarchical Cluster ◽

Small Scale ◽

Total Abundance ◽

First Order ◽

Environmental Events

AbstractWe studied the patterns of pre-collapse communities, the small-scale and the large-scale signals of collapses, and the environmental events before the collapses using four paleoecological and one modern data series. We applied and evaluated eight indicators in our analysis: the relative abundance of species, hierarchical cluster analysis, principal component analysis, total abundance, species richness, standard deviation (without a rolling window), first-order autoregression, and the relative abundance of the dominant species. We investigated the signals at the probable collapse triggering unusual environmental events and at the collapse zone boundaries, respectively. We also distinguished between pulse and step environmental events to see what signals the indicators give at these two different types of events. Our results show that first-order autoregression is not a good environmental event indicator, but it can forecast or indicate the collapse zones in climate change. The rest of the indicators are more sensitive to the pulse events than to the step events. Step events during climate change might have an essential role in initiating collapses. These events probably push the communities with low resilience beyond a critical threshold, so it is crucial to detect them. Before collapses, the total abundance and the species richness increase, the relative abundance of the species decreases. The hierarchical cluster analysis and the relative abundance of species together designate the collapse zone boundaries. We suggest that small-scale signals should be involved in analyses because they are often earlier than large-scale signals.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Download Full-text

Tree diversity regulates forest pest invasion

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1821039116 ◽

2019 ◽

Vol 116 (15) ◽

pp. 7382-7386 ◽

Cited By ~ 11

Author(s):

Qinfeng Guo ◽

Songlin Fei ◽

Kevin M. Potter ◽

Andrew M. Liebhold ◽

Jun Wen

Keyword(s):

Species Diversity ◽

Native Species ◽

Large Scale ◽

Tree Diversity ◽

Ecological Impacts ◽

Invasion Success ◽

Small Scale ◽

Forest Pest ◽

Natural Ecosystems

Nonnative pests often cause cascading ecological impacts, leading to detrimental socioeconomic consequences; however, how plant diversity may influence insect and disease invasions remains unclear. High species diversity in host communities may promote pest invasions by providing more niches (i.e., facilitation), but it can also diminish invasion success because low host dominance may make it more difficult for pests to establish (i.e., dilution). Most studies to date have focused on small-scale, experimental, or individual pest/disease species, while large-scale empirical studies, especially in natural ecosystems, are extremely rare. Using subcontinental-level data, we examined the role of tree diversity on pest invasion across the conterminous United States and found that the tree-pest diversity relationships are hump-shaped. Pest diversity increases with tree diversity at low tree diversity (because of facilitation or amplification) and is reduced at higher tree diversity (as a result of dilution). Thus, tree diversity likely regulates forest pest invasion through both facilitation and dilution that operate simultaneously, but their relative strengths vary with overall diversity. Our findings suggest the role of native species diversity in regulating nonnative pest invasions.

Download Full-text

Structure of 0+ juvenile fish assemblages in the modified upper stretch of the River Elbe, Czech Republic

Czech Journal of Animal Science ◽

10.17221/7192-cjas ◽

2014 ◽

Vol 59 (No. 1) ◽

pp. 35-44 ◽

Cited By ~ 3

Author(s):

Z. Valová ◽

M. Janáč ◽

J. Švanyga ◽

P. Jurajda

Keyword(s):

Species Richness ◽

Fish Assemblage ◽

Fish Assemblages ◽

Juvenile Fish ◽

Species Abundance ◽

Shannon Index ◽

Catch Per Unit Effort ◽

River Elbe ◽

Nursery Habitats ◽

Species Richness And Abundance

In August 2007, the 0+ juvenile fish assemblage of the upper River Elbe was surveyed using electrofishing. Thirty-six localities were sampled along a 177 km long section between the towns of Verdek and Brandýs nad Labem (river km (RKM) 136–313). Four localities with natural riverbeds, 14 channelized stretches, nine beaches, and nine backwaters were sampled. Altogether, 4521 0+ juvenile fishes were caught, belonging to 26 species. A decrease in species richness and abundance was evident near Hradec Králové, while decreased species abundance was noted along the navigated stretch below Přelouč. The highest catch-per-unit-effort (CPUE), species richness, and Shannon index values were observed at beach habitats, the lowest in channelized habitats, and intermediate values in backwaters. Generally, rare beach habitats had significantly more rheophilic species than other habitats, while backwaters had significantly more eurytopic species and higher CPUE for limnophilic species. Backwaters and channel habitats, however, did not differ in any other 0+ fish assemblage parameter studied. The study demonstrated the importance of beaches for fish assemblages along navigable channels. Surprisingly, however, backwaters were not confirmed as important nursery habitats.

Download Full-text

Spatial predictability of juvenile fish species richness and abundance in a coral reef environment

Coral Reefs ◽

10.1007/s00338-007-0281-3 ◽

2007 ◽

Vol 26 (4) ◽

pp. 895-907 ◽

Cited By ~ 34

Author(s):

C. Mellin ◽

S. Andréfouët ◽

D. Ponton

Keyword(s):

Species Richness ◽

Coral Reef ◽

Fish Species ◽

Juvenile Fish ◽

Reef Environment ◽

Coral Reef Environment ◽

Species Richness And Abundance

Download Full-text

A Simple and Efficient Pipeline for Construction, Merging, Expansion, and Simulation of Large-Scale, Single-Cell Mechanistic Models

10.1101/2020.11.09.373407 ◽

2020 ◽

Author(s):

Cemal Erdem ◽

Ethan M. Bensman ◽

Arnab Mutsuddy ◽

Michael M. Saint-Antoine ◽

Mehdi Bouhaddou ◽

...

Keyword(s):

Machine Learning ◽

Data Integration ◽

Single Cell ◽

Large Scale ◽

Mechanistic Modeling ◽

Small Scale ◽

Test Case ◽

Mechanistic Models ◽

Biomedical Data ◽

Egf Receptors

ABSTRACTThe current era of big biomedical data accumulation and availability brings data integration opportunities for leveraging its totality to make new discoveries and/or clinically predictive models. Black-box statistical and machine learning methods are powerful for such integration, but often cannot provide mechanistic reasoning, particularly on the single-cell level. While single-cell mechanistic models clearly enable such reasoning, they are predominantly “small-scale”, and struggle with the scalability and reusability required for meaningful data integration. Here, we present an open-source pipeline for scalable, single-cell mechanistic modeling from simple, annotated input files that can serve as a foundation for mechanistic data integration. As a test case, we convert one of the largest existing single-cell mechanistic models to this format, demonstrating robustness and reproducibility of the approach. We show that the model cell line context can be changed with simple replacement of input file parameter values. We next use this new model to test alternative mechanistic hypotheses for the experimental observations that interferon-gamma (IFNG) inhibits epidermal growth factor (EGF)-induced cell proliferation. Model- based analysis suggested, and experiments support that these observations are better explained by IFNG-induced SOCS1 expression sequestering activated EGF receptors, thereby downregulating AKT activity, as opposed to direct IFNG-induced upregulation of p21 expression. Overall, this new pipeline enables large-scale, single-cell, and mechanistically-transparent modeling as a data integration modality complementary to machine learning.

Download Full-text