large datasets Latest Research Papers

Artificial intelligence (AI) is increasingly used in medical image analysis and has accelerated scientific discoveries across fields of medicine. In this review, we highlight how AI has been applied to neuroimaging in patients with epilepsy to enhance classification of clinical diagnosis, prediction of treatment outcomes, and the understanding of cognitive comorbidities. We outline the strengths and shortcomings of current AI research and the need for future studies using large datasets that test the reproducibility and generalizability of current findings, as well as studies that test the clinical utility of AI approaches.

Get full-text (via PubEx)

Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets

Applied Intelligence ◽

10.1007/s10489-021-03090-y ◽

2022 ◽

Author(s):

Wang Shuliang ◽

Tisinee Surapunt

Keyword(s):

Large Datasets ◽

Information Coefficient ◽

Maximal Information Coefficient

Get full-text (via PubEx)

Machine Learning and data mining tools applied for databases of low number of records

Advanced Engineering Research ◽

10.23947/2687-1653-2021-21-4-346-363 ◽

2022 ◽

Vol 21 (4) ◽

pp. 346-363

Author(s):

Hubert Anysz

Keyword(s):

Machine Learning ◽

Data Mining ◽

Computational Methods ◽

Large Datasets ◽

Learning Tools ◽

Data Preparation ◽

Preparation Methods ◽

Use Of Data ◽

Small Set ◽

Mining Tools

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods

Get full-text (via PubEx)

Large Scale Multimedia Management: A Comprehensive Review

Information ◽

10.3390/info13010028 ◽

2022 ◽

Vol 13 (1) ◽

pp. 28

Author(s):

Saïd Mahmoudi ◽

Mohammed Amin Belarbi

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Large Datasets ◽

Multimedia Data ◽

Multimedia Retrieval ◽

Easy Access ◽

3D Images ◽

Comprehensive Review ◽

Large Databases ◽

Multimedia Management

Multimedia applications deal, in most cases, with an extremely high volume of multimedia data (2D and 3D images, sounds, videos). That is why efficient algorithms should be developed to analyze and process these large datasets. On the other hand, multimedia management is based on efficient representation of knowledge which allows efficient data processing and retrieval. The main challenge in this era is to achieve clever and quick access to these huge datasets to allow easy access to the data and in a reasonable time. In this context, large-scale image retrieval is a fundamental task. Many methods have been developed in the literature to achieve fast and efficient navigating in large databases by using the famous content-based image retrieval (CBIR) methods associated with these methods allowing a decrease in the computing time, such as dimensional reduction and hashing methods. More recently, these methods based on convolutional neural networks (CNNs) for feature extraction and image classification are widely used. In this paper, we present a comprehensive review of recent multimedia retrieval methods and algorithms applied to large datasets of 2D/3D images and videos. This editorial paper discusses the mains challenges of multimedia retrieval in a context of large databases.

Get full-text (via PubEx)

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Neural Computing and Applications ◽

10.1007/s00521-021-06875-x ◽

2022 ◽

Author(s):

Gioele Ciaparrone ◽

Leonardo Chiariglione ◽

Roberto Tagliaferri

Keyword(s):

Deep Learning ◽

Video Retrieval ◽

Large Datasets ◽

Validation Dataset ◽

Learning Models ◽

Average Precision ◽

Query Image ◽

Shot Detection ◽

Independent Test ◽

End To End

AbstractFace-based video retrieval (FBVR) is the task of retrieving videos that containing the same face shown in the query image. In this article, we present the first end-to-end FBVR pipeline that is able to operate on large datasets of unconstrained, multi-shot, multi-person videos. We adapt an existing audiovisual recognition dataset to the task of FBVR and use it to evaluate our proposed pipeline. We compare a number of deep learning models for shot detection, face detection, and face feature extraction as part of our pipeline on a validation dataset made of more than 4000 videos. We obtain 97.25% mean average precision on an independent test set, composed of more than 1000 videos. The pipeline is able to extract features from videos at $$\sim $$ ∼ 7 times the real-time speed, and it is able to perform a query on thousands of videos in less than 0.5 s.

Get full-text (via PubEx)

Challenges and Opportunities in the Study of Innovation Ecosystems in the COVID-19 Pandemic Context

10.4018/978-1-7998-8011-0.ch006 ◽

2022 ◽

pp. 104-124

Author(s):

Hugo Garcia Tonioli Defendi ◽

Vanessa de Arruda Jorge ◽

Ana Paula da Silva Carvalho ◽

Luciana da Silva Madeira ◽

Suzana Borschiver

Keyword(s):

Knowledge Construction ◽

Information Technologies ◽

Large Datasets ◽

Data Generation ◽

Common Structure ◽

Innovative Solutions ◽

Challenges And Opportunities ◽

Speed Up ◽

Collaborative Efforts ◽

Health Field

The process of knowledge construction, widely discussed in the literature, follows a common structure that encompasses transformation of data into information and then into knowledge, which converges social, technological, organizational, and strategic aspects. The advancement of information technologies and growing global research efforts in the health field has dynamically generated large datasets, thus providing potential innovative solutions to health problems, posing important challenges in selection and interpretation of useful information and possibilities. COVID-19 pandemic has intensified this data generation as results of global efforts, and cooperation has promoted a level of scientific production never experienced before concerning the overcoming of the pandemic. In this context, the search for an effective and safe vaccine that can prevent the spread of this virus has become a common goal of societies, governments, institutions, and companies. These collaborative efforts have contributed to speed up the development of these vaccines at an unprecedented pace in history.

Get full-text (via PubEx)

Civic Technology and Data for Good

10.4018/978-1-6684-3706-3.ch072 ◽

2022 ◽

pp. 1330-1345

Author(s):

John G. McNutt ◽

Lauri Goldkind

Keyword(s):

Citizen Participation ◽

Public Management ◽

Final Section ◽

Social Problems ◽

Large Datasets ◽

Open Government ◽

Working Through ◽

Advanced Analytics ◽

Almost All ◽

Civic Technology

Governments have long dealt with the issue of engaging their constituents in the process of governance, and e-participation efforts have been a part of this effort. Almost all of these efforts have been controlled by government. Civic technology and data4good, fueled by the movement toward open government and open civic data, represent a sea change in this relationship. A similar movement is data for good, which uses volunteer data scientists to address social problems using advanced analytics and large datasets. Working through a variety of organizations, they apply the power of data to problems. This chapter will explore these possibilities and outline a set of scenarios that might be possible. The chapter has four parts. The first part looks at citizen participation in broad brush, with special attention to e-participation. The next two sections look at civic technology and data4good. The final section looks at the possible changes that these two embryonic movements can have on the structure of participation in government and to the nature of public management.

Get full-text (via PubEx)

On Directed Densest Subgraph Discovery

ACM Transactions on Database Systems ◽

10.1145/3483940 ◽

2021 ◽

Vol 46 (4) ◽

pp. 1-45

Author(s):

Chenhao Ma ◽

Yixiang Fang ◽

Reynold Cheng ◽

Laks V. S. Lakshmanan ◽

Wenjie Zhang ◽

...

Keyword(s):

State Of The Art ◽

Directed Graphs ◽

Exact Algorithms ◽

Large Datasets ◽

Dense Subgraph ◽

Extensive Evaluation ◽

Wide Range ◽

Edge Graph ◽

Community Mining ◽

Densest Subgraph

Given a directed graph G , the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G , whose density is the highest among all the subgraphs of G . The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph, it takes three days for one of the best exact algorithms to complete. In this article, we develop an efficient and scalable DDS solution. We introduce the notion of [ x , y ]-core, which is a dense subgraph for G , and show that the densest subgraph can be accurately located through the [ x , y ]-core with theoretical guarantees. Based on the [ x , y ]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and we develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on 15 real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.

Get full-text (via PubEx)

large datasets
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Use of large datasets of measured environmental concentrations for the ecological risk assessment of chemical mixtures in Italian streams: A case study

Atrial Fibrillation Ablation in Chronic Kidney Disease – Lessons from Large Datasets

Artificial Intelligence Applications in the Imaging of Epilepsy and Its Comorbidities: Present and Future

Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets

Machine Learning and data mining tools applied for databases of low number of records

Large Scale Multimedia Management: A Comprehensive Review

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Challenges and Opportunities in the Study of Innovation Ecosystems in the COVID-19 Pandemic Context

Civic Technology and Data for Good

On Directed Densest Subgraph Discovery

Export Citation Format

large datasetsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Use of large datasets of measured environmental concentrations for the ecological risk assessment of chemical mixtures in Italian streams: A case study

Atrial Fibrillation Ablation in Chronic Kidney Disease – Lessons from Large Datasets

Artificial Intelligence Applications in the Imaging of Epilepsy and Its Comorbidities: Present and Future

Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets

Machine Learning and data mining tools applied for databases of low number of records

Large Scale Multimedia Management: A Comprehensive Review

A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

Challenges and Opportunities in the Study of Innovation Ecosystems in the COVID-19 Pandemic Context

Civic Technology and Data for Good

On Directed Densest Subgraph Discovery

large datasets
Recently Published Documents