Evaluating a programming topic using GitHub data: what we can learn about machine learning

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Paolo Dello Vicario ◽  
Valentina Tortolini

Purpose The purpose of this paper is to define a methodology to analyze links between programming topics and libraries starting from GitHub data. Design/methodology/approach This paper developed an analysis over machine learning repositories on GitHub, finding communities of repositories and studying the anatomy of collaboration around a popular topic such as machine learning. Findings This analysis indicates the significant importance of programming languages and technologies such as Python and Jupyter Notebook. It also shows the rise of deep learning and of specific libraries such as Tensorflow from Google. Originality/value There exists no survey or analysis based on how developers influence each other for specific topics. Other researchers focused their analysis on the collaborative structure and social impact instead of topic impact. Using this methodology to analyze programming topics is important not just for machine learning but also for other topics.

2015 ◽  
Vol 22 (5) ◽  
pp. 573-590 ◽  
Author(s):  
Mojtaba Maghrebi ◽  
Claude Sammut ◽  
S. Travis Waller

Purpose – The purpose of this paper is to study the implementation of machine learning (ML) techniques in order to automatically measure the feasibility of performing ready mixed concrete (RMC) dispatching jobs. Design/methodology/approach – Six ML techniques were selected and tested on data that was extracted from a developed simulation model and answered by a human expert. Findings – The results show that the performance of most of selected algorithms were the same and achieved an accuracy of around 80 per cent in terms of accuracy for the examined cases. Practical implications – This approach can be applied in practice to match experts’ decisions. Originality/value – In this paper the feasibility of handling complex concrete delivery problems by ML techniques is studied. Currently, most of the concrete mixing process is done by machines. However, RMC dispatching still relies on human resources to complete many tasks. In this paper the authors are addressing to reconstruct experts’ decisions as only practical solution.


2006 ◽  
Vol 40 (3) ◽  
pp. 286-295 ◽  
Author(s):  
Andrew Buxton

PurposeTo review the variety of software solutions available for putting CDS/ISIS databases on the internet. To help anyone considering which route to take.Design/methodology/approachBriefly describes the characteristics, history, origin and availability of each package. Identifies the type of skills required to implement the package and the kind of application it is suited to. Covers CDS/ISIS Unix version, JavaISIS, IsisWWW, WWWISIS Versions 3 and 5, Genisis, IAH, WWW‐ISIS, and OpenIsis.FindingsThere is no obvious single “best” solution. Several are free but may require more investment in acquiring the skills to install and configure them. The choice will depend on the user's experience with CDS/ISIS formatting language, HTML, programming languages, operating systems, open source software, and so on.Originality/valueThere is detailed documentation available for most of these packages, but little previous guidance to help potential users to distinguish and choose between them.


2014 ◽  
Vol 30 (4) ◽  
pp. 15-17 ◽  

Purpose – This paper aims to review the latest management developments across the globe and pinpoint practical implications from cutting-edge research and case studies. Design/methodology/approach – This briefing is prepared by an independent writer who adds their own impartial comments and places the articles in context. Findings – Becoming increasingly reliant on the web as a principal source of finding information is altering our brains and the way that we obtain and hold knowledge. We are becoming less reliant on our memories to hold knowledge, instead using technology – and search engines like Google in particular – to deposit and retrieve information. Practical implications – The paper provides strategic insights and practical thinking that have influenced some of the world's leading organizations. Social implications – The paper provides strategic insights and practical thinking that can have a broader social impact. Originality/value – The briefing saves busy executives and researchers hours of reading time by selecting only the very best, most pertinent information and presenting it in a condensed and easy-to-digest format.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Deepa S.N.

Purpose Limitations encountered with the models developed in the previous studies had occurrences of global minima; due to which this study developed a new intelligent ubiquitous computational model that learns with gradient descent learning rule and operates with auto-encoders and decoders to attain better energy optimization. Ubiquitous machine learning computational model process performs training in a better way than regular supervised learning or unsupervised learning computational models with deep learning techniques, resulting in better learning and optimization for the considered problem domain of cloud-based internet-of-things (IOTs). This study aims to improve the network quality and improve the data accuracy rate during the network transmission process using the developed ubiquitous deep learning computational model. Design/methodology/approach In this research study, a novel intelligent ubiquitous machine learning computational model is designed and modelled to maintain the optimal energy level of cloud IOTs in sensor network domains. A new intelligent ubiquitous computational model that learns with gradient descent learning rule and operates with auto-encoders and decoders to attain better energy optimization is developed. A new unified deterministic sine-cosine algorithm has been developed in this study for parameter optimization of weight factors in the ubiquitous machine learning model. Findings The newly developed ubiquitous model is used for finding network energy and performing its optimization in the considered sensor network model. At the time of progressive simulation, residual energy, network overhead, end-to-end delay, network lifetime and a number of live nodes are evaluated. It is elucidated from the results attained, that the ubiquitous deep learning model resulted in better metrics based on its appropriate cluster selection and minimized route selection mechanism. Research limitations/implications In this research study, a novel ubiquitous computing model derived from a new optimization algorithm called a unified deterministic sine-cosine algorithm and deep learning technique was derived and applied for maintaining the optimal energy level of cloud IOTs in sensor networks. The deterministic levy flight concept is applied for developing the new optimization technique and this tends to determine the parametric weight values for the deep learning model. The ubiquitous deep learning model is designed with auto-encoders and decoders and their corresponding layers weights are determined for optimal values with the optimization algorithm. The modelled ubiquitous deep learning approach was applied in this study to determine the network energy consumption rate and thereby optimize the energy level by increasing the lifetime of the sensor network model considered. For all the considered network metrics, the ubiquitous computing model has proved to be effective and versatile than previous approaches from early research studies. Practical implications The developed ubiquitous computing model with deep learning techniques can be applied for any type of cloud-assisted IOTs in respect of wireless sensor networks, ad hoc networks, radio access technology networks, heterogeneous networks, etc. Practically, the developed model facilitates computing the optimal energy level of the cloud IOTs for any considered network models and this helps in maintaining a better network lifetime and reducing the end-to-end delay of the networks. Social implications The social implication of the proposed research study is that it helps in reducing energy consumption and increases the network lifetime of the cloud IOT based sensor network models. This approach helps the people in large to have a better transmission rate with minimized energy consumption and also reduces the delay in transmission. Originality/value In this research study, the network optimization of cloud-assisted IOTs of sensor network models is modelled and analysed using machine learning models as a kind of ubiquitous computing system. Ubiquitous computing models with machine learning techniques develop intelligent systems and enhances the users to make better and faster decisions. In the communication domain, the use of predictive and optimization models created with machine learning accelerates new ways to determine solutions to problems. Considering the importance of learning techniques, the ubiquitous computing model is designed based on a deep learning strategy and the learning mechanism adapts itself to attain a better network optimization model.


2020 ◽  
Vol 34 (1) ◽  
pp. 30-47 ◽  
Author(s):  
Mohamed Zaki ◽  
Janet R. McColl-Kennedy

Purpose The purpose of this paper is to offer a step-by-step text mining analysis roadmap (TMAR) for service researchers. The paper provides guidance on how to choose between alternative tools, using illustrative examples from a range of business contexts. Design/methodology/approach The authors provide a six-stage TMAR on how to use text mining methods in practice. At each stage, the authors provide a guiding question, articulate the aim, identify a range of methods and demonstrate how machine learning and linguistic techniques can be used in practice with illustrative examples drawn from business, from an array of data types, services and contexts. Findings At each of the six stages, this paper demonstrates useful insights that result from the text mining techniques to provide an in-depth understanding of the phenomenon and actionable insights for research and practice. Originality/value There is little research to guide scholars and practitioners on how to gain insights from the extensive “big data” that arises from the different data sources. In a first, this paper addresses this important gap highlighting the advantages of using text mining to gain useful insights for theory testing and practice in different service contexts.


2020 ◽  
Vol 27 (8) ◽  
pp. 1891-1912
Author(s):  
Hengqin Wu ◽  
Geoffrey Shen ◽  
Xue Lin ◽  
Minglei Li ◽  
Boyu Zhang ◽  
...  

PurposeThis study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.Design/methodology/approachThis study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.FindingsThe validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.Practical implicationsThis study contributes a specific collection for ICTC patents, which is not provided by the patent offices.Social implicationsThe proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.Originality/valueA deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.


PurposeReviews the latest management developments across the globe and pinpoints practical implications from cutting‐edge research and case studies.Design/methodology/approachThis briefing is prepared by an independent writer who adds their own impartial comments and places the articles in context.FindingsAnyone who has been an employee for many years, and has worked for several different companies during that time, will know that it's extremely difficult to make sweeping statements about bosses. In a few cases they might have got there through ambition as much as talent, although a judicious combination of the two is more likely, and they are not always people with whom you would want to spend an evening in a bar. But once in situ, leaders' approaches to their role differ too much to make generalizations possible. This all provides fertile ground for experts examining leadership development.Practical implicationsProvides strategic insights and practical thinking that have influenced some of the world's leading organizations.Social implicationsProvides strategic insights and practical thinking that can have a broader social impact.Originality/valueThe briefing saves busy executives and researchers hours of reading time by selecting only the very best, most pertinent information and presenting it in a condensed and easy‐to‐digest format.


Kybernetes ◽  
2017 ◽  
Vol 46 (4) ◽  
pp. 693-705 ◽  
Author(s):  
Yasser F. Hassan

Purpose This paper aims to utilize machine learning and soft computing to propose a new method of rough sets using deep learning architecture for many real-world applications. Design/methodology/approach The objective of this work is to propose a model for deep rough set theory that uses more than decision table and approximating these tables to a classification system, i.e. the paper propose a novel framework of deep learning based on multi-decision tables. Findings The paper tries to coordinate the local properties of individual decision table to provide an appropriate global decision from the system. Research limitations/implications The rough set learning assumes the existence of a single decision table, whereas real-world decision problem implies several decisions with several different decision tables. The new proposed model can handle multi-decision tables. Practical implications The proposed classification model is implemented on social networks with preferred features which are freely distribute as social entities with accuracy around 91 per cent. Social implications The deep learning using rough sets theory simulate the way of brain thinking and can solve the problem of existence of different information about same problem in different decision systems Originality/value This paper utilizes machine learning and soft computing to propose a new method of rough sets using deep learning architecture for many real-world applications.


2021 ◽  
Vol 55 (4) ◽  
pp. 586-608
Author(s):  
Gabriela Montenegro Montenegro de Barros ◽  
Valdecy Pereira ◽  
Marcos Costa Roboredo

PurposeThis paper presents an algorithm that can elicitate (infer) all or any combination of elimination and choice expressing reality (ELECTRE) Tri-B parameters. For example, a decision maker can maintain the values for indifference, preference and veto thresholds, and the study’s algorithm can find the criteria weights, reference profiles and the lambda cutting level. The study’s approach is inspired by a machine learning ensemble technique, the random forest, and for that, the authors named the study’s approach as ELECTRE tree algorithm.Design/methodology/approachFirst, the authors generate a set of ELECTRE Tri-B models, where each model solves a random sample of criteria and alternates. Each sample is made with replacement, having at least two criteria and between 10% and 25% of alternates. Each model has its parameters optimized by a genetic algorithm (GA) that can use an ordered cluster or an assignment example as a reference to the optimization. Finally, after the optimization phase, two procedures can be performed; the first one will merge all models, finding in this way the elicitated parameters and in the second procedure, each alternate is classified (voted) by each separated model, and the majority vote decides the final class.FindingsThe authors have noted that concerning the voting procedure, nonlinear decision boundaries are generated and they can be suitable in analyzing problems of the same nature. In contrast, the merged model generates linear decision boundaries.Originality/valueThe elicitation of ELECTRE Tri-B parameters is made by an ensemble technique that is composed of a set of multicriteria models that are engaged in generating robust solutions.


2018 ◽  
Vol 26 (5) ◽  
pp. 613-636 ◽  
Author(s):  
Gunikhan Sonowal ◽  
KS Kuppusamy

Purpose This paper aims to propose a model entitled MMSPhiD (multidimensional similarity metrics model for screen reader user to phishing detection) that amalgamates multiple approaches to detect phishing URLs. Design/methodology/approach The model consists of three major components: machine learning-based approach, typosquatting-based approach and phoneme-based approach. The major objectives of the proposed model are detecting phishing URL, typosquatting and phoneme-based domain and suggesting the legitimate domain which is targeted by attackers. Findings The result of the experiment shows that the MMSPhiD model can successfully detect phishing with 99.03 per cent accuracy. In addition, this paper has analyzed 20 leading domains from Alexa and identified 1,861 registered typosquatting and 543 phoneme-based domains. Research limitations/implications The proposed model has used machine learning with the list-based approach. Building and maintaining the list shall be a limitation. Practical implication The results of the experiments demonstrate that the model achieved higher performance due to the incorporation of multi-dimensional filters. Social implications In addition, this paper has incorporated the accessibility needs of persons with visual impairments and provides an accessible anti-phishing approach. Originality/value This paper assists persons with visual impairments on detection phoneme-based phishing domains.


Sign in / Sign up

Export Citation Format

Share Document