use of data
Recently Published Documents





2022 ◽  
Vol 14 (1) ◽  
pp. 1-9
Saravanan Thirumuruganathan ◽  
Mayuresh Kunjir ◽  
Mourad Ouzzani ◽  
Sanjay Chawla

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

2022 ◽  
Kingsley Austin

Abstract— Credit card fraud is a serious problem for e-commerce retailers with UK merchants reporting losses of $574.2M in 2020. As a result, effective fraud detection systems must be in place to ensure that payments are processed securely in an online environment. From the literature, the detection of credit card fraud is challenging due to dataset imbalance (genuine versus fraudulent transactions), real-time processing requirements, and the dynamic behavior of fraudsters and customers. It is proposed in this paper that the use of machine learning could be an effective solution for combating credit card fraud.According to research, machine learning techniques can play a role in overcoming the identified challenges while ensuring a high detection rate of fraudulent transactions, both directly and indirectly. Even though both supervised and unsupervised machine learning algorithms have been suggested, the flaws in both methods point to the necessity for hybrid approaches.

2022 ◽  
pp. 009539972110699
Tracey Bark

Bureaucracies often provide information to legislatures in an effort to influence the agenda. This paper assesses whether data affects this influence, arguing quantitative support can increase the likelihood of legislative discussion and passage of bills related to a given topic. I also assess the impact of centralization on an agency’s ability to provide information and shape legislative agendas. I find including data in bureaucratic reports can significantly increase an agency’s influence on the legislature, but this effect is only present in a centralized setting. These results suggest centralized agencies are better equipped to marshal quantitative support for arguments to legislatures.

Miguel G. Folgado ◽  
Veronica Sanz

AbstractIn this paper we illustrate the use of Data Science techniques to analyse complex human communication. In particular, we consider tweets from leaders of political parties as a dynamical proxy to political programmes and ideas. We also study the temporal evolution of their contents as a reaction to specific events. We analyse levels of positive and negative sentiment in the tweets using new tools adapted to social media. We also train a Fully-Connected Neural Network (FCNN) to recognise the political affiliation of a tweet. The FCNN is able to predict the origin of the tweet with a precision in the range of 71–75%, and the political leaning (left or right) with a precision of around 90%. This study is meant to be viewed as an example of how to use Twitter data and different types of Data Science tools for a political analysis.

2022 ◽  
Vol 21 (4) ◽  
pp. 346-363
Hubert Anysz

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Yeamin Jacky ◽  
Noor Adwa Sulaiman

PurposeThis study examines the perceptions of interested stakeholders on the factors affecting the use of data analytics (DA) in financial statement audits. Response letters submitted by stakeholders of the auditing services to the International Auditing and Assurance Standards Board's (IAASB) Data Analytics Working Group (DAWG) served as sources for analysis.Design/methodology/approachThe modified information technology audit model was used as a framework to perform a direct content analysis of all the 50 response letters submitted to the DAWG.FindingsThe analysis showed that a range of attributes, such as the usefulness of DA in auditing, authoritative guidance (auditing standards), data reliability and quality, auditors' skills, clients' factors and costs, were the factors perceived by stakeholders to be affecting the use of DA in external auditing.Research limitations/implicationsThis study is subjected to the limitations inherent to all content analysis studies. Nonetheless, the findings offer additional insights about potential factors affecting the adoption of DA in audit practices.Originality/valueThe data noted in the published statements highlighted the perceptions of a range of stakeholders with regards to the factors affecting the use of DA in auditing.

2022 ◽  
Vol 9 (1) ◽  
Kornelia Batko ◽  
Andrzej Ślęzak

AbstractThe introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities. The direct research was carried out based on research questionnaire and conducted on a sample of 217 medical facilities in Poland. Literature studies have shown that the use of Big Data Analytics can bring many benefits to medical facilities, while direct research has shown that medical facilities in Poland are moving towards data-based healthcare because they use structured and unstructured data, reach for analytics in the administrative, business and clinical area. The research positively confirmed that medical facilities are working on both structural data and unstructured data. The following kinds and sources of data can be distinguished: from databases, transaction data, unstructured content of emails and documents, data from devices and sensors. However, the use of data from social media is lower as in their activity they reach for analytics, not only in the administrative and business but also in the clinical area. It clearly shows that the decisions made in medical facilities are highly data-driven. The results of the study confirm what has been analyzed in the literature that medical facilities are moving towards data-based healthcare, together with its benefits.

2022 ◽  
Vol 7 (2) ◽  
pp. 203-208
Made Sutha Yadnya ◽  
Ni Luh Sinar Ayu Ratna Dewi ◽  
Sudi Maryanto Al Sasongko ◽  
Rosmaliati Rosmaliati ◽  
Abdulah Zainuddin

In the covid-19 condition, lectures at the Department of Electrical Engineering, Mataram University changed from a face-to-face process to via the Internet. T here will be a very sharp increase in demand. The use of data initially provided by the University of Mataram using a free hotspot network turned into a burden on lecturers and students. This research was conducted by sampling, general compulsory subjects, compulsory electrical courses, and compulsory expertise subjects. The distribution of variations of students domiciled in the City of Mataram and the other place coverage Lombok Island, within NTB and outside NTB. The results obtained are as follows: students who still survive in Mataram City are 17% (10.5 GB), Lombok Island 48% (8.1 GB), outside Lonbok Island 27% (4.8 GB), and outside NTB 8% (15 GB). Keyword : covid-19; lectures; online

2022 ◽  
Giulia Agostinetto ◽  
Antonia Bruno ◽  
Anna Sandionigi ◽  
Alberto Brusati ◽  
Caterina Manzari ◽  

As human activities on our planet persist, causing widespread and irreversible environmental degradation, the need to biomonitor ecosystems has never been more pressing. These circumstances have required a renewal in monitoring techniques, encouraged by necessity to develop more rapid and accurate tools which will support timely observations of ecosystem structure and function. The World Exposition (from now 'EXPO2015') hosted in Milan from May to October 2015 was a global event that could be categorized as a mega-event, which can be defined as an acute environmental stressor, possibly generating biodiversity alteration and disturbance. During the six months of EXPO2015, exhibitors from more than 135 countries and 22 million visitors insisted on a 1.1 million square meters area. Faced with such a massive event, we explore the potential of DNA metabarcoding using three molecular markers to improve the understanding of anthropogenic impacts in the area, both considering air and water monitoring. Furthermore, we explore the effectiveness of the taxonomy assignment phase considering different taxonomic levels of analysis and the use of data mining approaches to predict sample origin. Unless the degree of taxa identification still remains open, our results showed that DNA metabarcoding is a powerful genomic-based tool to monitor biodiversity at the microscale, allowing us to capture exact fingerprints of specific event sites and to explore in a comprehensive manner the eukaryotic community alteration. With this work, we aim to disentangle and overcome the crucial issues related to the generalization of DNA metabarcoding in order to support future applications.

Sign in / Sign up

Export Citation Format

Share Document