Impact of programming languages on machine learning bugs

Author(s):  
Sebastian Sztwiertnia ◽  
Maximilian Grübel ◽  
Amine Chouchane ◽  
Daniel Sokolowski ◽  
Krishna Narasimhan ◽  
...  
Author(s):  
Amandeep Kaur ◽  
Sushma Jain ◽  
Shivani Goel ◽  
Gaurav Dhiman

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.


2021 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum

<p>Until recently, our pure Python, primitive equation ocean model Veros <br>has been about 1.5x slower than a corresponding Fortran implementation. <br>But thanks to a thriving scientific and machine learning library <br>ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are <br>within reach. Leveraging Google's JAX library, we find that our Python <br>model code can reach a 2-5 times higher energy efficiency on GPU <br>compared to a traditional Fortran model.</p><p>Therefore, we propose a new generation of geophysical models: One that <br>combines high-level abstractions and user friendliness on one hand, and <br>that leverages modern developments in high-performance computing and <br>machine learning research on the other hand.</p><p>We discuss what there is to gain from building models in high-level <br>programming languages, what we have achieved in Veros, and where we see <br>the modelling community heading in the future.</p>


Author(s):  
Vijender Kumar Solanki ◽  
Nguyen Ha Huy Cuong ◽  
Zonghyu (Joan) Lu

The machine learning is the emerging research domain, from which number of emerging trends are available, among them opinion mining is the one technology attraction through which the we could get analysis of the interested domain or we can say about the review from the customer towards any product or we can say any upcoming trending information. These two are the emerging words and we can say it's the buzz word in the information technology. As you will see that its widely use by the corporate sector to uplift the business next level. Before two decade you will not read any words e.g., Opinion mining or Sentiment analysis, but in the last two decade these words have given a new life to information technology domain as well as to the business. The important question which runs in the mind is why use sentiment analysis or opinion mining. The information technology has given number of new programming languages, new innovation and within that the data mining has given this trends to the users. The chapter is covering the three major concept's which comes under the machine learning e.g., Decision tree, Bayesian network and Support vector machine. The chapter is describing the basic inputs, and how it helps in supporting stakeholders by adopting these technologies.


Author(s):  
Anitha Elavarasi S. ◽  
Jayanthi J.

Machine learning provides the system to automatically learn without human intervention and improve their performance with the help of previous experience. It can access the data and use it for learning by itself. Even though many algorithms are developed to solve machine learning issues, it is difficult to handle all kinds of inputs data in-order to arrive at accurate decisions. The domain knowledge of statistical science, probability, logic, mathematical optimization, reinforcement learning, and control theory plays a major role in developing machine learning based algorithms. The key consideration in selecting a suitable programming language for implementing machine learning algorithm includes performance, concurrence, application development, learning curve. This chapter deals with few of the top programming languages used for developing machine learning applications. They are Python, R, and Java. Top three programming languages preferred by data scientist are (1) Python more than 57%, (2) R more than 31%, and (3) Java used by 17% of the data scientists.


Author(s):  
Abhinav Verma

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.


2019 ◽  
Author(s):  
Manoj Kumar ◽  
Cameron Thomas Ellis ◽  
Qihong Lu ◽  
Hejia Zhang ◽  
Mihai Capota ◽  
...  

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Paolo Dello Vicario ◽  
Valentina Tortolini

Purpose The purpose of this paper is to define a methodology to analyze links between programming topics and libraries starting from GitHub data. Design/methodology/approach This paper developed an analysis over machine learning repositories on GitHub, finding communities of repositories and studying the anatomy of collaboration around a popular topic such as machine learning. Findings This analysis indicates the significant importance of programming languages and technologies such as Python and Jupyter Notebook. It also shows the rise of deep learning and of specific libraries such as Tensorflow from Google. Originality/value There exists no survey or analysis based on how developers influence each other for specific topics. Other researchers focused their analysis on the collaborative structure and social impact instead of topic impact. Using this methodology to analyze programming topics is important not just for machine learning but also for other topics.


Sign in / Sign up

Export Citation Format

Share Document