COMBINING DATA ANALYTICS WITH XBRL: THE VIEWDRIVE CASE

Issues in Accounting Education ◽

10.2308/issues-2020-048 ◽

2021 ◽

Author(s):

Amanuel Fekade Tadesse ◽

Nishani Vincent

Keyword(s):

Active Learning ◽

Real World ◽

Data Analytics ◽

Large Datasets ◽

Securities And Exchange Commission ◽

Financial Report ◽

Business Reporting ◽

Accounting Curriculum ◽

Combining Data ◽

Real World Datasets

This advisory case is designed to develop data analytics skills using multiple large real-world datasets based on eXtensible Business Reporting Language (XBRL). This case can also be used to introduce students to XBRL concepts such as extension taxonomies. Students are asked to recommend an XBRL preparation software for a hypothetical company (ViewDrive) that is adopting XBRL to satisfy the financial report filing requirements imposed by the Securities and Exchange Commission (SEC). Students perform data cleansing (extract, transform, load) procedures to prepare large datasets for data analytics. Students are encouraged to think critically, specify assumptions before performing data analytics (using analytic software such as Tableau), and generate visualizations that support their written recommendations. The case is easy to implement, promotes active learning, and has received favorable student and instructor feedback. This case can be used to introduce technology and data analytics topics into the accounting curriculum to help satisfy AACSB’s objectives.

Download Full-text

Act or Be Acted Upon: Revolutionizing Accounting Curriculums with Data Analytics

Accounting Horizons ◽

10.2308/horizons-19-020 ◽

2020 ◽

Author(s):

Vernon J. Richardson ◽

Marcia Weidenmier Watson

Keyword(s):

Real World ◽

Data Analytics ◽

Core Competencies ◽

Real World Data ◽

Use Of Data ◽

Accounting Curriculum ◽

Framework Approach ◽

Business Acumen ◽

Using Data ◽

Do So

Technology is revolutionizing accounting. To survive, accountants must focus on areas where they can complement technology and carve out a competitive advantage where the expertise of accountants is uniquely needed. To do so, we highlight the new core competencies emphasizing the use of data analytics. We propose a new, revolutionized curriculum focusing on (1) providing students with a step-by-step framework/approach for analyzing the data that includes the use of statistics; (2) using data analytics across the accounting curriculum to build data analytics skills; and (3) incorporating the use of real-world data for its analysis. This new curriculum combines business acumen to provide context as well as technological adeptness to analyze the data, and prepares the CPA professional for the future. We conclude by arguing that the accounting profession faces a choice: either master technology or be mastered by technology. The choice is ours. Act or be acted upon.

Download Full-text

On the Use of Real-World Datasets for Reaction Yield Prediction

10.33774/chemrxiv-2021-2x06r-v3 ◽

2021 ◽

Author(s):

Mandana Saebi ◽

Bozhao Nan ◽

John Herr ◽

Jessica Wahlers ◽

Zhichun Guo ◽

...

Keyword(s):

Real World ◽

Chemical Yield ◽

Large Datasets ◽

Reaction Yield ◽

Yield Prediction ◽

Large Pharmaceutical Company ◽

High Throughput Experimentation ◽

Attributed Graph ◽

Real World Datasets ◽

Better Than

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as good or better than the best previous models on two HTE datasets for the Suzuki and Buchwald-Hartwig reactions. However, training of the AGNN on the ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.

Download Full-text

Near-Optimal Fingerprinting with Constraints

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2016-0051 ◽

2016 ◽

Vol 2016 (4) ◽

pp. 470-487 ◽

Cited By ~ 6

Author(s):

Gábor György Gulyás ◽

Gergely Acs ◽

Claude Castelluccia

Keyword(s):

Real World ◽

Large Datasets ◽

Smartphone Applications ◽

Privacy Threats ◽

Optimal Fingerprinting ◽

Real World Datasets ◽

Privacy Risks ◽

The Web

Abstract Several recent studies have demonstrated that people show large behavioural uniqueness. This has serious privacy implications as most individuals become increasingly re-identifiable in large datasets or can be tracked, while they are browsing the web, using only a couple of their attributes, called as their fingerprints. Often, the success of these attacks depends on explicit constraints on the number of attributes learnable about individuals, i.e., the size of their fingerprints. These constraints can be budget as well as technical constraints imposed by the data holder. For instance, Apple restricts the number of applications that can be called by another application on iOS in order to mitigate the potential privacy threats of leaking the list of installed applications on a device. In this work, we address the problem of identifying the attributes (e.g., smartphone applications) that can serve as a fingerprint of users given constraints on the size of the fingerprint. We give the best fingerprinting algorithms in general, and evaluate their effectiveness on several real-world datasets. Our results show that current privacy guards limiting the number of attributes that can be queried about individuals is insufficient to mitigate their potential privacy risks in many practical cases.

Download Full-text

Optimism in Active Learning

Computational Intelligence and Neuroscience ◽

10.1155/2015/973696 ◽

2015 ◽

Vol 2015 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Timothé Collet ◽

Olivier Pietquin

Keyword(s):

Machine Learning ◽

Active Learning ◽

Real World ◽

State Of The Art ◽

Classification Error ◽

Exploration And Exploitation ◽

Training Set ◽

Learning Problem ◽

The Face ◽

Real World Datasets

Active learning is the problem of interactively constructing the training set used in classification in order to reduce its size. It would ideally successively add the instance-label pair that decreases the classification error most. However, the effect of the addition of a pair is not known in advance. It can still be estimated with the pairs already in the training set. The online minimization of the classification error involves a tradeoff between exploration and exploitation. This is a common problem in machine learning for which multiarmed bandit, using the approach of Optimism int the Face of Uncertainty, has proven very efficient these last years. This paper introduces three algorithms for the active learning problem in classification using Optimism in the Face of Uncertainty. Experiments lead on built-in problems and real world datasets demonstrate that they compare positively to state-of-the-art methods.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3450526 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-20

Author(s):

Dongsheng Li ◽

Haodong Liu ◽

Chao Chen ◽

Yingying Zhao ◽

Stephen M. Chu ◽

...

Keyword(s):

Collaborative Filtering ◽

Optimization Problems ◽

Empirical Studies ◽

Large Datasets ◽

Model Learning ◽

Global Models ◽

Convex Optimization Problems ◽

Memory Network ◽

Real World Datasets ◽

Performance Tradeoff

In collaborative filtering (CF) algorithms, the optimal models are usually learned by globally minimizing the empirical risks averaged over all the observed data. However, the global models are often obtained via a performance tradeoff among users/items, i.e., not all users/items are perfectly fitted by the global models due to the hard non-convex optimization problems in CF algorithms. Ensemble learning can address this issue by learning multiple diverse models but usually suffer from efficiency issue on large datasets or complex algorithms. In this article, we keep the intermediate models obtained during global model learning as the snapshot models, and then adaptively combine the snapshot models for individual user-item pairs using a memory network-based method. Empirical studies on three real-world datasets show that the proposed method can extensively and significantly improve the accuracy (up to 15.9% relatively) when applied to a variety of existing collaborative filtering methods.

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text

Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios

ACM Transactions on the Web ◽

10.1145/3448015 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-33

Author(s):

Wenjun Jiang ◽

Jing Chen ◽

Xiaofei Ding ◽

Jie Wu ◽

Jiawei He ◽

...

Keyword(s):

Decision Making ◽

Real World ◽

Text Summarization ◽

Experimental Results ◽

Product Review ◽

Comprehensive Review ◽

Online Systems ◽

Real World Datasets ◽

Different Characteristics

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.

Download Full-text

Multityped Community Discovery in Time-Evolving Heterogeneous Information Networks Based on Tensor Decomposition

Complexity ◽

10.1155/2018/9653404 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Jibing Wu ◽

Lianfei Yu ◽

Qun Zhang ◽

Peiteng Shi ◽

Lihua Liu ◽

...

Keyword(s):

Real World ◽

Tensor Decomposition ◽

Information Networks ◽

Community Discovery ◽

Star Network ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

General Network ◽

Real World Datasets ◽

Discovery Method

The heterogeneous information networks are omnipresent in real-world applications, which consist of multiple types of objects with various rich semantic meaningful links among them. Community discovery is an effective method to extract the hidden structures in networks. Usually, heterogeneous information networks are time-evolving, whose objects and links are dynamic and varying gradually. In such time-evolving heterogeneous information networks, community discovery is a challenging topic and quite more difficult than that in traditional static homogeneous information networks. In contrast to communities in traditional approaches, which only contain one type of objects and links, communities in heterogeneous information networks contain multiple types of dynamic objects and links. Recently, some studies focus on dynamic heterogeneous information networks and achieve some satisfactory results. However, they assume that heterogeneous information networks usually follow some simple schemas, such as bityped network and star network schema. In this paper, we propose a multityped community discovery method for time-evolving heterogeneous information networks with general network schemas. A tensor decomposition framework, which integrates tensor CP factorization with a temporal evolution regularization term, is designed to model the multityped communities and address their evolution. Experimental results on both synthetic and real-world datasets demonstrate the efficiency of our framework.

Download Full-text