Google Summer of Code Gender Diversity: An analysis of the last 4 editions

This work presents a comprehensive research about the participationof men and women in the area of Information and CommunicationsTechnology (ICT) through data extracted from the last foureditions of Google Summer of Code (GSoC). The goal of this workis to find Association Rules between gender characteristics andcoding using the Apriori Algorithm. A total of 61 association ruleswere generated through the aforementioned algorithm, being 22 ofthem found only in the data set with the women, 24 found only withthe men, and 15 applicable to both sets. We can cite as one of themain findings of this work the fact that the representativeness ofwomen in GSoC is decreasing in the last few years. Despite this, therepresentativeness of women in GSoC is above average, accordingto what has been reported in other studies in the literature in whichwomen are underrepresented. When it comes to the most utilizedtechnologies, we have “Python", “Java", “C++", “C" and “JavaScript"in the top. Analyzing technologies, it’s possible to realize that themain utilized technologies for men and women are similar, but, ingeneral, men are more likely linked to programming languages.The most common project topics are: “Event Management", “Web",“Web Development", “Data Science" and “Cloud" in the top. Thiscan represent how diverse the project topics of the database are,but not necessarily has something related to gender.

Download Full-text

Improved Classification Techniques to Predict the Co-disease in Diabetic Mellitus Patients using Discretization and Apriori Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1434.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 730-733

Keyword(s):

Data Mining ◽

Association Rules ◽

Census Data ◽

Early Stage ◽

Research Work ◽

Numerical Data ◽

Medical Data ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.

Download Full-text

Improved Classification Techniques to Predict the Co-disease in Diabetic Mellitus Patients using Discretization and Apriori Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1434.0881119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 730-733

Keyword(s):

Data Mining ◽

Association Rules ◽

Census Data ◽

Early Stage ◽

Research Work ◽

Numerical Data ◽

Medical Data ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

Download Full-text

Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

Energies ◽

10.3390/en14216889 ◽

2021 ◽

Vol 14 (21) ◽

pp. 6889

Author(s):

Yuxin Huang ◽

Jingdao Fan ◽

Zhenguo Yan ◽

Shugang Li ◽

Yanping Wang

Keyword(s):

Association Rules ◽

Early Warning ◽

Association Rule ◽

Cluster Center ◽

Apriori Algorithm ◽

Data Set ◽

Early Warning Model ◽

Initial Cluster ◽

Different Dimensions ◽

Warning Model

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.

Download Full-text

Review and comparison of Apriori algorithm implementations on Hadoop-MapReduce and Spark

The Knowledge Engineering Review ◽

10.1017/s0269888918000127 ◽

2018 ◽

Vol 33 ◽

Cited By ~ 4

Author(s):

Eduardo P. S. Castro ◽

Thiago D. Maia ◽

Marluce R. Pereira ◽

Ahmed A. A. Esmin ◽

Denilson A. Pereira

Keyword(s):

Association Rules ◽

Data Sets ◽

Apriori Algorithm ◽

Mapreduce Framework ◽

Data Set ◽

Hadoop Mapreduce ◽

Detailed Assessment ◽

Mining Association Rules

AbstractSeveral Apriori algorithm implementations for mining association rules have been proposed in the literature using the Hadoop-MapReduce framework and, more recently, Spark. However, none of the works have made a detailed assessment of its performance, for example, comparing it with other implementations in various characteristics of data sets. In this work, we present a review of the main algorithms proposed for Hadoop-MapReduce and compared their implementations in a single environment under several different situations. Moreover, these algorithms had their implementations adapted to Spark, and also compared under the same circumstances. Based on the results of the experiments, we present a framework for recommending the Apriori implementation most appropriate for solving a given problem, according to the data set characteristics and minimum required support. The results show that Spark implementations overcome Hadoop-MapReduce implementations at runtime in most experiments. However, there is no single implementation that is the best in all the evaluated situations.

Download Full-text

Analysis and prediction of student placement for improving the education standards

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10429 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 303

Author(s):

S Anjali Devi ◽

M Vishnu Priya ◽

P Akhila ◽

N Vasundhara

Keyword(s):

Academic Success ◽

Association Rules ◽

Evaluation System ◽

Educational Institutions ◽

Apriori Algorithm ◽

Education Standards ◽

Data Set ◽

Student Placement ◽

Conventional Methods ◽

Academic Information

Students’ academic success can be evaluated based on their performance in the exams conducted by the institutions. In this paper, we propose a scheme where prediction of student final placement can be done based on the marks scored by them in the previous semesters. In order to predict the placement of the student we need some data to analyze. For this purpose we will supply students basic details and their previous academic information into the system which will be used to predict the placement of the student. This is done by generating association rules using apriori algorithm. Admin and user will use this system. Here user will be the student. Admin and user will use their login to access the system. Admin will add academic details of the students, like their SSC, HSC, Graduation marks (up to current semester, Back logs etc.,). User will be the student. Admin and user will use their login to access the system. Admin will add academic details of the students, like their SSC, HSC, Graduation marks (up to current semester, Back logs etc.,). This system can be used in schools, colleges and other educational institutions. This evaluation system is more accurate than other conventional methods. We are using a university data set to predict the placement of the student.

Download Full-text

Study of the Behavior of Cryptocurrencies in Turbulent Times Using Association Rules

Mathematics ◽

10.3390/math9141620 ◽

2021 ◽

Vol 9 (14) ◽

pp. 1620

Author(s):

José Benito Hernández C. ◽

Andrés García-Medina ◽

Miguel Andrés Porro V.

Keyword(s):

Time Series ◽

Association Rules ◽

Apriori Algorithm ◽

Data Set ◽

Turbulent Times

We studied the effects of the recent financial turbulence of 2020 on the cryptocurrency market, taking into account both prices and volumes from December 2019 to July 2020. Time series were transformed into transaction matrices, and the Apriori algorithm was applied to find the association rules between different currencies, identifying whether the price or the volume of the currencies compose the rules. We divided the data set into two subsets and found that before the decline in cryptocurrency prices, the association rules were generally formed by these prices and that, then, the volumes of the transactions dominated to form the association rules.

Download Full-text

Enhanced K_way Method In APRIORI Algorithm for Mining the Association Rules Through Embedding SQL Commands

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i10.5256 ◽

2019 ◽

Vol 7 (10) ◽

pp. 52-56

Author(s):

Basel A. Dabwan ◽

Mukti E. Jadhav

Keyword(s):

Association Rules ◽

Apriori Algorithm

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Development and temporal external validation of a simple risk score tool for prediction of outcomes after severe head injury based on admission characteristics from level-1 trauma centre of India using retrospectively collected data

BMJ Open ◽

10.1136/bmjopen-2020-040778 ◽

2021 ◽

Vol 11 (1) ◽

pp. e040778

Author(s):

Vineet Kumar Kamal ◽

Ravindra Mohan Pandey ◽

Deepak Agrawal

Keyword(s):

Hospital Mortality ◽

External Validation ◽

Trauma Centre ◽

Unfavourable Outcome ◽

Motor Score ◽

Validation Data ◽

Data Set ◽

Development Data ◽

Level 1 ◽

Pupillary Reactivity

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.

Download Full-text

Legislative Diversity and the Rise of Women Lobbyists

Political Research Quarterly ◽

10.1177/10659129211009305 ◽

2021 ◽

pp. 106591292110093

Author(s):

James M. Strickland ◽

Katelyn E. Stauffer

Keyword(s):

Gender Diversity ◽

Personal Characteristics ◽

The United States ◽

Revolving Door ◽

Women And Politics ◽

Political Equality ◽

Data Set ◽

Women Legislators ◽

Organized Interests ◽

American States

Despite a growing body of literature examining the consequences of women’s inclusion among lobbyists, our understanding of the factors that lead to women’s initial emergence in the profession is limited. In this study, we propose that gender diversity among legislative targets incentivizes organized interests to hire women lobbyists, and thus helps to explain when and how women emerge as lobbyists. Using a comprehensive data set of registered lobbyist–client pairings from all American states in 1989 and 2011, we find that legislative diversity influences not only the number of lobby contracts held by women but also the number of former women legislators who become revolving-door lobbyists. This second finding further supports the argument that interests capitalize on the personal characteristics of lobbyists, specifically by hiring women to work in more diverse legislatures. Our findings have implications for women and politics, lobbying, and voice and political equality in the United States.

Download Full-text