scholarly journals Google Summer of Code Gender Diversity: An analysis of the last 4 editions

Author(s):  
Jhemeson Silva Mota ◽  
Marcio Vinicius Okimoto ◽  
Edna Dias Canedo ◽  
Jhonatan Silva Mota

This work presents a comprehensive research about the participationof men and women in the area of Information and CommunicationsTechnology (ICT) through data extracted from the last foureditions of Google Summer of Code (GSoC). The goal of this workis to find Association Rules between gender characteristics andcoding using the Apriori Algorithm. A total of 61 association ruleswere generated through the aforementioned algorithm, being 22 ofthem found only in the data set with the women, 24 found only withthe men, and 15 applicable to both sets. We can cite as one of themain findings of this work the fact that the representativeness ofwomen in GSoC is decreasing in the last few years. Despite this, therepresentativeness of women in GSoC is above average, accordingto what has been reported in other studies in the literature in whichwomen are underrepresented. When it comes to the most utilizedtechnologies, we have “Python", “Java", “C++", “C" and “JavaScript"in the top. Analyzing technologies, it’s possible to realize that themain utilized technologies for men and women are similar, but, ingeneral, men are more likely linked to programming languages.The most common project topics are: “Event Management", “Web",“Web Development", “Data Science" and “Cloud" in the top. Thiscan represent how diverse the project topics of the database are,but not necessarily has something related to gender.

The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.


The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 6889
Author(s):  
Yuxin Huang ◽  
Jingdao Fan ◽  
Zhenguo Yan ◽  
Shugang Li ◽  
Yanping Wang

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.


Author(s):  
Eduardo P. S. Castro ◽  
Thiago D. Maia ◽  
Marluce R. Pereira ◽  
Ahmed A. A. Esmin ◽  
Denilson A. Pereira

AbstractSeveral Apriori algorithm implementations for mining association rules have been proposed in the literature using the Hadoop-MapReduce framework and, more recently, Spark. However, none of the works have made a detailed assessment of its performance, for example, comparing it with other implementations in various characteristics of data sets. In this work, we present a review of the main algorithms proposed for Hadoop-MapReduce and compared their implementations in a single environment under several different situations. Moreover, these algorithms had their implementations adapted to Spark, and also compared under the same circumstances. Based on the results of the experiments, we present a framework for recommending the Apriori implementation most appropriate for solving a given problem, according to the data set characteristics and minimum required support. The results show that Spark implementations overcome Hadoop-MapReduce implementations at runtime in most experiments. However, there is no single implementation that is the best in all the evaluated situations.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 303
Author(s):  
S Anjali Devi ◽  
M Vishnu Priya ◽  
P Akhila ◽  
N Vasundhara

Students’ academic success can be evaluated based on their performance in the exams conducted by the institutions. In this paper, we propose a scheme where prediction of student final placement can be done based on the marks scored by them in the previous semesters. In order to predict the placement of the student we need some data to analyze. For this purpose we will supply students basic details and their previous academic information into the system which will be used to predict the placement of the student. This is done by generating association rules using apriori algorithm. Admin and user will use this system. Here user will be the student. Admin and user will use their login to access the system. Admin will add academic details of the students, like their SSC, HSC, Graduation marks (up to current semester, Back logs etc.,). User will be the student. Admin and user will use their login to access the system. Admin will add academic details of the students, like their SSC, HSC, Graduation marks (up to current semester, Back logs etc.,).  This system can be used in schools, colleges and other educational institutions. This evaluation system is more accurate than other conventional methods. We are using a university data set to predict the placement of the student.


Mathematics ◽  
2021 ◽  
Vol 9 (14) ◽  
pp. 1620
Author(s):  
José Benito Hernández C. ◽  
Andrés García-Medina ◽  
Miguel Andrés Porro V.

We studied the effects of the recent financial turbulence of 2020 on the cryptocurrency market, taking into account both prices and volumes from December 2019 to July 2020. Time series were transformed into transaction matrices, and the Apriori algorithm was applied to find the association rules between different currencies, identifying whether the price or the volume of the currencies compose the rules. We divided the data set into two subsets and found that before the decline in cryptocurrency prices, the association rules were generally formed by these prices and that, then, the volumes of the transactions dominated to form the association rules.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


BMJ Open ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. e040778
Author(s):  
Vineet Kumar Kamal ◽  
Ravindra Mohan Pandey ◽  
Deepak Agrawal

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.


2021 ◽  
pp. 106591292110093
Author(s):  
James M. Strickland ◽  
Katelyn E. Stauffer

Despite a growing body of literature examining the consequences of women’s inclusion among lobbyists, our understanding of the factors that lead to women’s initial emergence in the profession is limited. In this study, we propose that gender diversity among legislative targets incentivizes organized interests to hire women lobbyists, and thus helps to explain when and how women emerge as lobbyists. Using a comprehensive data set of registered lobbyist–client pairings from all American states in 1989 and 2011, we find that legislative diversity influences not only the number of lobby contracts held by women but also the number of former women legislators who become revolving-door lobbyists. This second finding further supports the argument that interests capitalize on the personal characteristics of lobbyists, specifically by hiring women to work in more diverse legislatures. Our findings have implications for women and politics, lobbying, and voice and political equality in the United States.


Sign in / Sign up

Export Citation Format

Share Document