scholarly journals STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING

2021 ◽  
Vol 86 (6) ◽  
pp. 1-18
Author(s):  
Viktoriia V. Zhukovska ◽  
Oleksandr O. Mosiiuk

The rapid development of computer software and network technologies has facilitated the intensive application of specialized statistical software not only in the traditional information technology spheres (i.e., statistics, engineering, artificial intelligence) but also in linguistics. The statistical software R is one of the most popular analytical tools for statistical processing a huge array of digitalized language data, especially in quantitative corpus linguistic studies of Western Europe and North America. This article discusses the functionality of the software package R, focusing on its advantages in performing complex statistical analyses of linguistic data in corpus-driven studies and creating linguistic classifiers in machine learning. With this in mind, a three-stage strategy of computer-statistical analysis of linguistic corpus data is elaborated: 1) data processing and preparing to be subjected to a statistical procedure, 2) utilizing statistical hypothesis testing methods (MANOVA, ANOVA) and the Tukey post-hoc test, and 3) developing a model of a linguistic classifier and analyzing its effectiveness. The strategy is implemented on 11 000 tokens of English detached nonfinite constructions with an explicit subject extracted from the BNC-BYU corpus. The statistical analysis indicates significant differences in the realization of the factors of the parameter “Part of speech of the subject”. The analyzed linguistic data are employed to build a machine model for the classification of the given constructions. Particular attention is devoted to the methodological perspectives of interdisciplinary research in the fields of linguistics and computer studies. The potential application of the elaborated case study in training undergraduate, master, and postgraduate students of Applied Linguistics is indicated. The article provides all the statistical data and codes written in the R script with comprehensive descriptions and explanations. The concluding part of the article summarizes the obtained results and highlights the issues for further research connected with the popularization of the statistical software complex R and raising the awareness of specialists in this statistical analysis system.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9395
Author(s):  
Sara Hintze ◽  
Freija Maulbetsch ◽  
Lucy Asher ◽  
Christoph Winckler

Background Animals kept in barren environments often show increased levels of inactivity and first studies indicate that inactive behaviour may reflect boredom or depression-like states. However, to date, knowledge of what inactivity looks like in different species is scarce and methods to precisely describe and analyse inactive behaviour are thus warranted. Methods We developed an Inactivity Ethogram including detailed information on the postures of different body parts (Standing/Lying, Head, Ears, Eyes, Tail) for fattening cattle, a farm animal category often kept in barren environments. The Inactivity Ethogram was applied to Austrian Fleckvieh heifers kept in intensive, semi-intensive and pasture-based husbandry systems to record inactive behaviour in a range of different contexts. Three farms per husbandry system were visited twice; once in the morning and once in the afternoon to cover most of the daylight hours. During each visit, 16 focal animals were continuously observed for 15 minutes each (96 heifers per husbandry system, 288 in total). Moreover, the focal animals’ groups were video recorded to later determine inactivity on the group level. Since our study was explorative in nature, we refrained from statistical hypothesis testing, but analysed both the individual- and group-level data descriptively. Moreover, simultaneous occurrences of postures of different body parts (Standing/Lying, Head, Ears and Eyes) were analysed using the machine learning algorithm cspade to provide insight into co-occurring postures of inactivity. Results Inspection of graphs indicated that with increasing intensity of the husbandry system, more animals were inactive (group-level data) and the time the focal animals were inactive increased (individual-level data). Frequently co-occurring postures were generally similar between husbandry systems, but with subtle differences. The most frequently observed combination on farms with intensive and semi-intensive systems was lying with head up, ears backwards and eyes open whereas on pasture it was standing with head up, ears forwards and eyes open. Conclusion Our study is the first to explore inactive behaviour in cattle by applying a detailed description of postures from an Inactivity Ethogram and by using the machine learning algorithm cspade to identify frequently co-occurring posture combinations. Both the ethogram created in this study and the cspade algorithm may be valuable tools in future studies aiming to better understand different forms of inactivity and how they are associated with different affective states.


2021 ◽  
Vol 12 (1) ◽  
pp. 1-20
Author(s):  
Gao Niu ◽  
Richard S. Segall ◽  
Zichen Zhao ◽  
Zhijian Wu

This paper discusses the definitions of open source software, free software and freeware, and the concept of big data. The authors then introduce R and Python as the two most popular open source statistical software (OSSS). Additional OSSS, such as JASP, PSPP, GRETL, SOFA Statistics, Octave, KNIME, and Scilab, are also introduced in this paper with function descriptions and modeling examples. They further discuss OSSS's capability in artificial intelligence application and modeling and Popular OSSS-based machine learning libraries and systems. The paper intends to provide a reference for readers to make proper selections of open source software when statistical analysis tasks are needed. In addition, working platform and selective numerical, descriptive and analysis examples are provided for each software. Readers could have a direct and in-depth understanding of each software and its functional highlights.


2021 ◽  
Author(s):  
Sawon Pratiher ◽  
Ananth Radhakrishnan ◽  
Karuna P. Sahoo ◽  
SAZEDUL ALAM ◽  
Scott E. Kerick ◽  
...  

<p>"This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible."</p><p><br></p><p>Physiological sensing has long been an indispensable fixture for virtual reality (VR) gaming studies. Moreover, VR induced stressors are increasingly being used to assess the impact of stress on an individual’s health and well-being. This study discusses the results of experimental research comprising multimodal physiological signal acquisition from 31 participants during a Go/No-Go VR-based shooting exercise where participants had to shoot the enemy and spare the friendly targets. The study encompasses multiple sessions, including orientation, thresholding, and shooting. The shooting sessions consist of tasks under low & high difficulty induced stress conditions with in-between baseline segments. Machine learning (ML) performance with heart rate variability (HRV) from electrocardiogram (ECG) and electroencephalogram (EEG) features outperform the prevalent methods for four different VR gaming difficulty-induced stress (GDIS) classification problems (CPs). Further, the significance of the HRV predictors and different brain region activations from EEG is deciphered using statistical hypothesis testing (SHT). The ablation study shows the efficacy of multimodal physiological sensing for different gaming difficulty-induced stress classification problems (GDISCPs) in a VR shooting task.</p>


Author(s):  
Erica Tavazzi ◽  
Camille L. Gerard ◽  
Olivier Michielin ◽  
Alexandre Wicky ◽  
Roberto Gatta ◽  
...  

AbstractThanks to its ability to offer a time-oriented perspective on the clinical events that define the patient’s path of care, Process Mining (PM) is assuming an emerging role in clinical data analytics. PM’s ability to exploit time-series data and to build processes without any a priori knowledge suggests interesting synergies with the most common statistical analyses in healthcare, in particular survival analysis. In this work we demonstrate contributions of our process-oriented approach in analyzing a real-world retrospective dataset of patients treated for advanced melanoma at the Lausanne University Hospital. Addressing the clinical questions raised by our oncologists, we integrated PM in almost all the steps of a common statistical analysis. We show: (1) how PM can be leveraged to improve the quality of the data (data cleaning/pre-processing), (2) how PM can provide efficient data visualizations that support and/or suggest clinical hypotheses, also allowing to check the consistency between real and expected processes (descriptive statistics), and (3) how PM can assist in querying or re-expressing the data in terms of pre-defined reference workflows for testing survival differences among sub-cohorts (statistical inference). We exploit a rich set of PM tools for querying the event logs, inspecting the processes using statistical hypothesis testing, and performing conformance checking analyses to identify patterns in patient clinical paths and study the effects of different treatment sequences in our cohort.


2021 ◽  
Author(s):  
Sawon Pratiher ◽  
Ananth Radhakrishnan ◽  
Karuna P. Sahoo ◽  
SAZEDUL ALAM ◽  
Scott E. Kerick ◽  
...  

<p>"This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible."</p><p><br></p><p>Physiological sensing has long been an indispensable fixture for virtual reality (VR) gaming studies. Moreover, VR induced stressors are increasingly being used to assess the impact of stress on an individual’s health and well-being. This study discusses the results of experimental research comprising multimodal physiological signal acquisition from 31 participants during a Go/No-Go VR-based shooting exercise where participants had to shoot the enemy and spare the friendly targets. The study encompasses multiple sessions, including orientation, thresholding, and shooting. The shooting sessions consist of tasks under low & high difficulty induced stress conditions with in-between baseline segments. Machine learning (ML) performance with heart rate variability (HRV) from electrocardiogram (ECG) and electroencephalogram (EEG) features outperform the prevalent methods for four different VR gaming difficulty-induced stress (GDIS) classification problems (CPs). Further, the significance of the HRV predictors and different brain region activations from EEG is deciphered using statistical hypothesis testing (SHT). The ablation study shows the efficacy of multimodal physiological sensing for different gaming difficulty-induced stress classification problems (GDISCPs) in a VR shooting task.</p>


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Lukas Vlcek ◽  
Shize Yang ◽  
Yongji Gong ◽  
Pulickel Ajayan ◽  
Wu Zhou ◽  
...  

AbstractExploration of structure-property relationships as a function of dopant concentration is commonly based on mean field theories for solid solutions. However, such theories that work well for semiconductors tend to fail in materials with strong correlations, either in electronic behavior or chemical segregation. In these cases, the details of atomic arrangements are generally not explored and analyzed. The knowledge of the generative physics and chemistry of the material can obviate this problem, since defect configuration libraries as stochastic representation of atomic level structures can be generated, or parameters of mesoscopic thermodynamic models can be derived. To obtain such information for improved predictions, we use data from atomically resolved microscopic images that visualize complex structural correlations within the system and translate them into statistical mechanical models of structure formation. Given the significant uncertainties about the microscopic aspects of the material’s processing history along with the limited number of available images, we combine model optimization techniques with the principles of statistical hypothesis testing. We demonstrate the approach on data from a series of atomically-resolved scanning transmission electron microscopy images of MoxRe1-xS2 at varying ratios of Mo/Re stoichiometries, for which we propose an effective interaction model that is then used to generate atomic configurations and make testable predictions at a range of concentrations and formation temperatures.


Cancers ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 1153
Author(s):  
Elysia Racanelli ◽  
Abdulhadi Jfri ◽  
Amnah Gefri ◽  
Elizabeth O’Brien ◽  
Ivan Litvinov ◽  
...  

Background: Cutaneous squamous cell carcinoma (cSCC) is a rare complication of hidradenitis suppurativa (HS). Objectives: To conduct a systematic review and an individual patient data (IPD) meta-analysis to describe the clinical characteristics of HS patients developing cSCC and determine predictors of poor outcome. Methods: Medline/PubMed, Embase, and Web of Science were searched for studies reporting cSCC arising in patients with HS from inception to December 2019. A routine descriptive analysis, statistical hypothesis testing, and Kaplan–Meier survival curves/Cox proportional hazards regression models were performed. Results: A total of 34 case reports and series including 138 patients were included in the study. The majority of patients were males (81.6%), White (83.3%), and smokers (n = 22/27 reported) with a mean age of 53.5 years. Most patients had gluteal (87.8%), Hurley stage 3 HS (88.6%). The mean time from the diagnosis of HS to the development of cSCC was 24.7 years. Human papillomavirus was identified in 12/38 patients tested. Almost 50% of individuals had nodal metastasis and 31.3% had distant metastases. Half of the patients succumbed to their disease. Conclusions: cSCC is a rare but life-threatening complication seen in HS patients, mainly occurring in White males who are smokers with severe, long-standing gluteal HS. Regular clinical examination and biopsy of any suspicious lesions in high-risk patients should be considered. The use of HPV vaccination as a preventive and possibly curative method needs to be explored.


Sign in / Sign up

Export Citation Format

Share Document