Linguistic Proficiency: A Quantitative Approach to Immigrant and Heritage Speakers of Danish

AbstractThis paper presents a corpus-based quantitative study on linguistic proficiency of approx. 300 immigrant and heritage speakers of Danish in North America and Argentina, aiming at the question whether linguistic proficiency is connected to ‘immigrant generation’ (i.e. the difference between speakers who migrated as adults with a fully acquired language competence and foreign-born heritage speakers) or the sociocultural setting, or both. The large data base at hand provides a rare opportunity to compare developments within the same minority language in different places, representing different sociocultural settings for the immigrant or heritage speakers and, accordingly, different language ecologies. The study relies on the Corpus of American Danish (1.6 million tokens, including both words and non-word utterances). Based on this data set, the paper explores the distribution of 13 linguistic and non-linguistic variables representing linguistic proficiency (i.e. Danish words, L2 words, word-internal codeswitching, type-token ratio, empty and filled pauses, self-interruption, lengthening, speech rate, word length, runlength and the ratio of main and subclauses) by applying Factor Analysis as a statistical tool. On an empirically solid basis, the paper concludes that (a) the sociolinguistic setting is the crucial factor in the development of linguistic proficiency and (b) linguistic proficiency is a non-universal cognitive phenomenon.

Download Full-text

Do Women Get Fewer Votes? No.

Canadian Journal of Political Science ◽

10.1017/s0008423918000495 ◽

2018 ◽

Vol 52 (1) ◽

pp. 201-210 ◽

Cited By ~ 8

Author(s):

Semra Sevi ◽

Vincent Arel-Bundock ◽

André Blais

Keyword(s):

Political Parties ◽

Gender Gap ◽

Time Trends ◽

Large Data ◽

Study Data ◽

Data Set ◽

Men And Women ◽

Women Candidates ◽

Percentage Points ◽

The Difference

AbstractWe study data on the gender of more than 21,000 unique candidates in all Canadian federal elections since 1921, when the first women ran for seats in Parliament. This large data set allows us to compute precise estimates of the difference in the electoral fortunes of men and women candidates. When accounting for party effects and time trends, we find that the difference between the vote shares of men and women is substantively negligible (±0.5 percentage point). This gender gap was larger in the 1920s (±2.5 percentage points), but it is now statistically indistinguishable from zero. Our results have important normative implications: political parties should recruit and promote more women candidates because they remain underrepresented in Canadian politics and because they do not suffer from a substantial electoral penalty.

Download Full-text

Reassessing the Generational Disparity in Immigrant Offending: A Within-family Comparison of Involvement in Crime

Journal of Research in Crime and Delinquency ◽

10.1177/0022427819850600 ◽

2019 ◽

Vol 56 (6) ◽

pp. 851-887 ◽

Cited By ~ 2

Author(s):

Bianca E. Bersani ◽

Adam W. Pittman

Keyword(s):

First Generation ◽

Second Generation ◽

Well Being ◽

Foreign Born ◽

Data Set ◽

The Difference ◽

Successive Generations ◽

Offending Patterns ◽

High Degree ◽

Parent Child

Objective:This study reassesses the generational disparity in immigrant offending. Patterns and predictors of offending are compared using traditional peer-based models and an alternative within-family (parent–child dyad) model.Method:The National Longitudinal Survey of Youth (1979; NLSY79) and NLSY-Child and Young Adult (NLSY_CYA) data are merged to create an intergenerational data set to compare generational disparities in immigrant offending across peers and within families. Differences in self-reported offending (prevalence and variety) by immigrant generation are assessed using a combination of descriptive analyses (χ2and analysis of variance) and regression models.Results:While NLSY_CYA children generally are at a greater risk of offending compared with the NLSY79 mothers, the difference in offending is greatest between first-generation mom and second-generation child dyads. Disparities in offending are driven in large part by exceedingly low levels of offending among first-generation immigrants.Conclusion:Although the factors driving an increase in offending between parent–child generations are not unique to immigrants, they are amplified in immigrant families. Whereas the second generation is remarkably similar to their U.S.-born counterparts in terms of their involvement in crime, suggesting a high degree of swift integration, the greater involvement in crime among the children of immigrants compared to their foreign-born mothers suggests a decline in well-being across successive generations.

Download Full-text

Elevation gaps in fluvial sandbar deposition and their implications for paleodepth estimation

Geology ◽

10.1130/g47521.1 ◽

2020 ◽

Vol 48 (7) ◽

pp. 718-722

Author(s):

Jason S. Alexander ◽

Brandon J. McElroy ◽

Snehalata Huzurbazar ◽

Marissa L. Murr

Keyword(s):

Large Data ◽

Accurate Estimation ◽

Systematic Bias ◽

Flow Depth ◽

Mean Flow ◽

Data Set ◽

Bed Elevation ◽

The Difference ◽

Base Elevation ◽

Thickness Measurements

Abstract Accurate estimation of paleo–streamflow depth from outcrop is important for estimation of channel slopes, water discharges, sediment fluxes, and basin sizes of ancient river systems. Bar-scale inclined strata deposited from slipface avalanching on fluvial bar margins are assumed to be indicators of paleodepth insofar as their thickness approaches but does not exceed formative flow depths. We employed a unique, large data set from a prolonged bank-filling flood in the sandy, braided Missouri River (USA) to examine scaling between slipface height and measures of river depth during the flood. The analyses demonstrated that the most frequent slipface height observations underestimate study-reach mean flow depth at peak stage by a factor of 3, but maximum values are approximately equal to mean flow depth. At least 70% of the error is accounted for by the difference between slipface base elevation and mean bed elevation, while the difference between crest elevation and water surface accounts for ∼30%. Our analysis provides a scaling for bar-scale inclined strata formed by avalanching and suggests risk of systematic bias in paleodepth estimation if mean thickness measurements of these deposits are equated to mean bankfull depth.

Download Full-text

Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions

Atmospheric Chemistry and Physics ◽

10.5194/acp-20-6291-2020 ◽

2020 ◽

Vol 20 (11) ◽

pp. 6291-6303

Author(s):

Guy Dagan ◽

Philip Stier

Keyword(s):

Initial Conditions ◽

Cloud Droplet ◽

Large Data ◽

Data Set ◽

Cloud Properties ◽

Large Spread ◽

The Difference ◽

Ice Content ◽

Aerosol Effects ◽

Longwave Flux

Abstract. Aerosol effects on cloud properties and the atmospheric energy and radiation budgets are studied through ensemble simulations over two month-long periods during the NARVAL campaigns (Next-generation Aircraft Remote-Sensing for Validation Studies, December 2013 and August 2016). For each day, two simulations are conducted with low and high cloud droplet number concentrations (CDNCs), representing low and high aerosol concentrations, respectively. This large data set, which is based on a large spread of co-varying realistic initial conditions, enables robust identification of the effect of CDNC changes on cloud properties. We show that increases in CDNC drive a reduction in the top-of-atmosphere (TOA) net shortwave flux (more reflection) and a decrease in the lower-tropospheric stability for all cases examined, while the TOA longwave flux and the liquid and ice water path changes are generally positive. However, changes in cloud fraction or precipitation, that could appear significant for a given day, are not as robustly affected, and, at least for the summer month, are not statistically distinguishable from zero. These results highlight the need for using a large sample of initial conditions for cloud–aerosol studies for identifying the significance of the response. In addition, we demonstrate the dependence of the aerosol effects on the season, as it is shown that the TOA net radiative effect is doubled during the winter month as compared to the summer month. By separating the simulations into different dominant cloud regimes, we show that the difference between the different months emerges due to the compensation of the longwave effect induced by an increase in ice content as compared to the shortwave effect of the liquid clouds. The CDNC effect on the longwave flux is stronger in the summer as the clouds are deeper and the atmosphere is more unstable.

Download Full-text

Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions

10.5194/acp-2019-949 ◽

2019 ◽

Cited By ~ 1

Author(s):

Guy Dagan ◽

Philip Stier

Keyword(s):

Initial Conditions ◽

Cloud Droplet ◽

Large Data ◽

Data Set ◽

Cloud Properties ◽

Large Spread ◽

The Difference ◽

Ice Content ◽

Aerosol Effects ◽

Ice Water

Abstract. Aerosol effects on cloud properties and the atmospheric energy and radiation budgets are studied through ensemble simulations over two month-long periods during the NARVAL campaigns (December 2013 and August 2016). For each day, two simulations are conducted with low and high cloud droplet number concentrations (CDNC), representing low and high aerosol concentrations, respectively. This large data-set, which is based on a large spread of co-varying realistic initial conditions, enables robust identification of the effect of CDNC changes on cloud properties. We show that increases in CDNC drive a reduction in the top of atmosphere (TOA) net shortwave flux (more reflection) and a decrease in the lower tropospheric stability for all cases examined, while the TOA longwave flux and the liquid and ice water path changes are generally positive. However, changes in cloud fraction or precipitation, that could appear significant for a given day, are not as robustly affected, and, at least for the summer month, are not statistically distinguishable from zero. These results highlight the need for using large statistics of initial conditions for cloud–aerosol studies for identifying the significance of the response. In addition, we demonstrate the dependence of the aerosol effects on the season, as it is shown that the TOA net radiative effect is doubled during the winter month as compared to the summer month. By separating the simulations into different dominant cloud regimes, we show that the difference between the different months emerge due to the compensation of the longwave effect induced by an increase in ice content as compared to the shortwave effect of the liquid clouds. The CDNC effect on the longwave is stronger in the summer as the clouds are deeper and the atmosphere is more unstable.

Download Full-text

Difference Fourier Analysis of Glucose Embedded and Frozen Hydrated Purple Membrane

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100053164 ◽

1982 ◽

Vol 40 ◽

pp. 74-75

Author(s):

Jules S. Jaffe ◽

Robert M. Glaeser

Keyword(s):

Purple Membrane ◽

Data Set ◽

High Resolution Data ◽

X Ray ◽

X Ray Crystallography ◽

Fourier Techniques ◽

Versus Protein ◽

The Difference ◽

Difference Fourier ◽

Ideal Method

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Correlation between the structure and skin permeability of compounds

Scientific Reports ◽

10.1038/s41598-021-89587-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ruolan Zeng ◽

Jiyong Deng ◽

Limin Dang ◽

Xinliang Yu

Keyword(s):

Large Data ◽

Qsar Model ◽

Coefficient Of Determination ◽

Support Vector ◽

Skin Permeability ◽

Data Set ◽

Test Set ◽

Svm Algorithm ◽

Svm Model ◽

Toxicity Relationship

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text