original dataset
Recently Published Documents


TOTAL DOCUMENTS

641
(FIVE YEARS 431)

H-INDEX

23
(FIVE YEARS 6)

2022 ◽  
pp. 1-50
Author(s):  
Emre Amasyalı

Abstract A significant literature demonstrates that the presence of historic missionary societies—especially Protestant societies—during the colonial period is significantly and positively associated with increased educational attainment and economic outcomes. However, we know less about the mechanisms underlying the long-run consequences of institutions, as it is commonly very hard to disentangle direct effects from indirect effects. One clear way to do so, however, is to explore the long-term impact of missionary influence in places in which the direct beneficiaries of missionary education are no longer present. The present article considers one such region, the Anatolian region of the Ottoman Empire. Due to the ethnic violence and population movements at the start of the twentieth century, the newfound Turkish nation-state was largely religiously homogenous. This provides us with a unique situation to empirically assess the long-run indirect effects of Christian missionary societies on local human capital. For this purpose, I present an original dataset that provides the locations of Protestant mission stations and schools, Ottoman state-run schools, and Armenian community schools contained within Ottoman Anatolia between 1820 and 1914. Contrary to the common association found in the literature, this study does not find missionary presence to be correlated with modern-day schooling. Rather, I find that regions with a heightened missionary presence and an active Christian educational market perform better on the gender parity index for pretertiary schooling during both the Ottoman and Turkish periods.


2022 ◽  
Vol 19 ◽  
pp. 474-480
Author(s):  
Nevila Baci ◽  
Kreshnik Vukatana ◽  
Marius Baci

Small and medium enterprises (SMEs) are businesses that account for a large percentage of the economy in many countries, but they lack cyber security. The present study examines different supervised machine learning methods with a focus on intrusion detection systems (IDSs) that will help in improving SMEs’ security. The algorithms that are tested through a real dataset, are Naïve Bayes, Sequential minimal optimization (SMO), C4.5 decision tree, and Random Forest. The experiments are run using the Waikato Environment for Knowledge Analyses (WEKA) 3.8.4 tools and the metrics used to evaluate the results were: accuracy, false-positive rate (FPR), and total time to train and build a classification model. The results obtained from the original dataset with 130 features show a high value of accuracy, but the computation time to build the classification model was notably high for the cases of C4.5 (1 hr. and 20 mins) and SMO algorithm (4 hrs. and 20 mins). the Information Gain (IG) method was used and the result was impressive. The time needed to train the model was reduced in the order of a few minutes and the accuracy was high (above 95%). In the end, challenges that SMEs can have for choosing an IDS such as lack of scalability and autonomic self-adaptation, can be solved by using a correct methodology with machine learning techniques.


Author(s):  
Sihang Cheng ◽  
Xiang Yu ◽  
Xinyue Chen ◽  
Zhengyu Jin ◽  
Huadan Xue ◽  
...  

Objective: To develop and evaluate a machine learning-based CT radiomics model for the prediction of hepatic encephalopathy (HE) after transjugular intrahepatic portosystemic shunt (TIPS). Methods: A total of 106 patients who underwent TIPS placement were consecutively enrolled in this retrospective study. Region of interests (ROIs) were drawn on unenhanced, arterial phase, and portal venous Phase CT images, and radiomics features were extracted, respectively. A radiomics model was established to predict the occurrence of HE after TIPS by using random forest algorithm and ten-fold cross-validation. Receiver operating characteristic (ROC) curves were performed to validate the capability of the radiomics model and clinical model on the training, test and original datasets, respectively. Results: The radiomics model showed favorable discriminatory ability in the training cohort with an area under the curve (AUC) of 0.899 (95% CI, 0.848 to 0.951), while in the test cohort, it was confirmed with an AUC of 0.887 (95% CI, 0.760 to 1.00). After applying this model to original dataset, it had an AUC of 0.955 (95% CI, 0.896 to 1.00). A clinical model was also built with an AUC of 0.649 (95% CI, 0.530 to 0.767) in the original dataset, and a Delong test demonstrated its relative lower efficiency when compared with the radiomics model (p < 0.05). Conclusion: Machine learning-based CT radiomics model performed better than traditional clinical parameter-based models in the prediction of post-TIPS HE. Advances in knowledge: Radiomics model for the prediction of post-TIPS HE was built based on feature extraction from routine acquired preoperative CT images and feature selection by random forest algorithm, which showed satisfied performance and proved the advantages of machine learning in this field.


2022 ◽  
pp. 215336872110732
Author(s):  
Courtney M. Echols

Research finds that historical anti-Black violence helps to explain the spatial distribution of contemporary conflict, inequality, and violence in the U.S. Building on this research, the current study examined the spatial relationship between chattel slavery in 1860, lynchings of Black individuals between 1882 and 1930, and anti-Black violence during the Civil Rights Movement era in which police or other legal authorities were implicated. I draw on an original dataset of over 300 events of police violence that occurred between 1954 and 1974 in the sample state of Louisiana, and that was compiled from a number of primary and secondary source documents that were themselves culled from archival research conducted in the state. Path analysis was then employed using negative binomial generalized structural equation modeling in order to assess the direct and indirect effects of these racially violent histories. The implications for social justice, public policy, and future research are also discussed. Keywords Slavery, lynchings, anti-Black violence, civil rights movement, police


2022 ◽  
pp. 095892872110505
Author(s):  
Erdem Yörük ◽  
İbrahim Öker ◽  
Gabriela Ramalho Tafoya

What welfare state regimes are observed when the analysis is extended globally, empirically and theoretically? We introduce a novel perspective into the ‘welfare state regimes analyzes’ – a perspective that brings developed and developing countries together and, as such, broadens the geographical, empirical and theoretical scope of the ‘welfare modelling business’. The expanding welfare regimes literature has suffered from several drawbacks: (i) it is radically slanted towards organisation for economic co-operation and development (OECD) countries, (ii) the literature on non-OECD countries does not use genuine welfare policy variables and (iii) social assistance and healthcare programmes are not utilized as components of welfare state effort and generosity. To overcome these limitations, we employ advanced data reduction methods, exploit an original dataset that we assembled from several international and domestic sources covering 52 emerging markets and OECD countries and present a welfare state regime structure as of the mid-2010s. Our analysis is based on genuine welfare policy variables that are theorized to capture welfare generosity and welfare efforts across five major policy domains: old-age pensions, sickness cash benefits, unemployment insurance, social assistance and healthcare. The sample of OECD countries and emerging market economies form four distinct welfare state regime clusters: institutional, neoliberal, populist and residual. We unveil the composition and performance of welfare state components in each welfare state regime family and develop politics-based working hypotheses about the formation of these regimes. Institutional welfare state regimes perform high in social security, healthcare and social assistance, while populist regimes perform moderately in social assistance and healthcare and moderate-to-high in social security. The neoliberal regime performs moderately in social assistance and healthcare, and it performs low in social security, and the residual regime performs low in all components. We then hypothesize that the relative political strengths of formal and informal working classes are key factors that shaped these welfare state regime typologies.


2022 ◽  
Author(s):  
Meelad Amouzgar ◽  
David R Glass ◽  
Reema Baskar ◽  
Inna Averbukh ◽  
Samuel C Kimmey ◽  
...  

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction enables visualization of data by representing cells in two-dimensional plots that capture the structure and heterogeneity of the original dataset. Visualizations contribute to human understanding of data and are useful for guiding both quantitative and qualitative analysis of cellular relationships. Existing algorithms are typically unsupervised, utilizing only measured features to generate manifolds, disregarding known biological labels such as cell type or experimental timepoint. Here, we repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling users to tailor visualizations to separate specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this flexible, computationally-efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes, such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality reduction algorithms and illustrate its utility and versatility for exploration of single-cell mass cytometry, transcriptomics and chromatin accessibility data.


2022 ◽  
Vol 19 (3) ◽  
pp. 2206-2218
Author(s):  
Chaofan Li ◽  
◽  
Kai Ma

<abstract> <p>Named entities are the main carriers of relevant medical knowledge in Electronic Medical Records (EMR). Clinical electronic medical records lead to problems such as word segmentation ambiguity and polysemy due to the specificity of Chinese language structure, so a Clinical Named Entity Recognition (CNER) model based on multi-head self-attention combined with BILSTM neural network and Conditional Random Fields is proposed. Firstly, the pre-trained language model organically combines char vectors and word vectors for the text sequences of the original dataset. The sequences are then fed into the parallel structure of the multi-head self-attention module and the BILSTM neural network module, respectively. By splicing the output of the neural network module to obtain multi-level information such as contextual information and feature association weights. Finally, entity annotation is performed by CRF. The results of the multiple comparison experiments show that the structure of the proposed model is very reasonable and robust, and it can effectively improve the Chinese CNER model. The model can extract multi-level and more comprehensive text features, compensate for the defect of long-distance dependency loss, with better applicability and recognition performance.</p> </abstract>


Author(s):  
Rohit Sahoo ◽  
◽  
Vedang Naik ◽  
Saurabh Singh ◽  
Shaveta Malik ◽  
...  

Heart disease instances are rising at an alarming rate, and it is critical and essential to predict any such ailments in advance. This is a challenging diagnostic that must be done accurately and swiftly. Lack of relevant data is often the impeding factor when it comes to various areas of research. Data augmentation is a strategy for improving the training of discriminative models that may be accomplished in a variety of ways. Deep generative models, which have recently advanced, now provide new approaches to enrich current data sets. Generative Models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are frequently used to generate high quality, realistic, synthetic data essential for machine learning algorithms as they play a critical role in various classification problems. In our case, we were provided with 304 rows of heart disease data to create a robust model for predicting the presence of an ailment in the patient. However, the identification of heart disease would not be efficient given the small amount of available training data. We used GAN, CGAN, and VAE to generate data to tackle this problem, thus augmenting the original data. This additional data will help in increasing the accuracy of the models created using the new dataset. We applied classification-based Machine Learning models such as Logistic Regression, Decision Trees, KNN, and Random Forest. We compared the accuracy of the said models, each of which was supplied with the original dataset and the augmented datasets that used the data generation techniques mentioned above. Our research suggests that using data generation techniques significantly boosts the accuracy of the machine learning techniques applied to them.


Author(s):  
Ramesh Adhikari ◽  
Suresh Pokharel

Data augmentation is widely used in image processing and pattern recognition problems in order to increase the richness in diversity of available data. It is commonly used to improve the classification accuracy of images when the available datasets are limited. Deep learning approaches have demonstrated an immense breakthrough in medical diagnostics over the last decade. A significant amount of datasets are needed for the effective training of deep neural networks. The appropriate use of data augmentation techniques prevents the model from over-fitting and thus increases the generalization capability of the network while testing afterward on unseen data. However, it remains a huge challenge to obtain such a large dataset from rare diseases in the medical field. This study presents the synthetic data augmentation technique using Generative Adversarial Networks to evaluate the generalization capability of neural networks using existing data more effectively. In this research, the convolutional neural network (CNN) model is used to classify the X-ray images of the human chest in both normal and pneumonia conditions; then, the synthetic images of the X-ray from the available dataset are generated by using the deep convolutional generative adversarial network (DCGAN) model. Finally, the CNN model is trained again with the original dataset and augmented data generated using the DCGAN model. The classification performance of the CNN model is improved by 3.2% when the augmented data were used along with the originally available dataset.


Author(s):  
Juan-Luis García-Mendoza ◽  
Luis Villaseñor-Pineda ◽  
Felipe Orihuela-Espina ◽  
Lázaro Bustio-Martínez

Distant Supervision is an approach that allows automatic labeling of instances. This approach has been used in Relation Extraction. Still, the main challenge of this task is handling instances with noisy labels (e.g., when two entities in a sentence are automatically labeled with an invalid relation). The approaches reported in the literature addressed this problem by employing noise-tolerant classifiers. However, if a noise reduction stage is introduced before the classification step, this increases the macro precision values. This paper proposes an Adversarial Autoencoders-based approach for obtaining a new representation that allows noise reduction in Distant Supervision. The representation obtained using Adversarial Autoencoders minimize the intra-cluster distance concerning pre-trained embeddings and classic Autoencoders. Experiments demonstrated that in the noise-reduced datasets, the macro precision values obtained over the original dataset are similar using fewer instances considering the same classifier. For example, in one of the noise-reduced datasets, the macro precision was improved approximately 2.32% using 77% of the original instances. This suggests the validity of using Adversarial Autoencoders to obtain well-suited representations for noise reduction. Also, the proposed approach maintains the macro precision values concerning the original dataset and reduces the total instances needed for classification.


Sign in / Sign up

Export Citation Format

Share Document