scholarly journals RESOURCES PROVIDING DATA FOR MACHINE LEARNING AND TESTING ARTIFICIAL INTELLIGENCE TECHNOLOGIES

Author(s):  
Денис Валерьевич Сикулер

В статье выполнен обзор 10 ресурсов сети Интернет, позволяющих подобрать данные для разнообразных задач, связанных с машинным обучением и искусственным интеллектом. Рассмотрены как широко известные сайты (например, Kaggle, Registry of Open Data on AWS), так и менее популярные или узкоспециализированные ресурсы (к примеру, The Big Bad NLP Database, Common Crawl). Все ресурсы предоставляют бесплатный доступ к данным, в большинстве случаев для этого даже не требуется регистрация. Для каждого ресурса указаны характеристики и особенности, касающиеся поиска и получения наборов данных. В работе представлены следующие сайты: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Портал открытых данных Российской Федерации, World Bank, The Big Bad NLP Database, Common Crawl. The work presents review of 10 Internet resources that can be used to find data for different tasks related to machine learning and artificial intelligence. There were examined some popular sites (like Kaggle, Registry of Open Data on AWS) and some less known and specific ones (like The Big Bad NLP Database, Common Crawl). All included resources provide free access to data. Moreover in most cases registration is not needed for data access. Main features are specified for every examined resource, including regarding data search and access. The following sites are included in the review: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Open Data portal of the Russian Federation, World Bank, The Big Bad NLP Database, Common Crawl.

2021 ◽  
Author(s):  
Björn Reetz ◽  
Hella Riede ◽  
Dirk Fuchs ◽  
Renate Hagedorn

<p>Since 2017, Open Data has been a part of the DWD data distribution strategy. Starting with a small selection of meteorological products, the number of available datasets has grown continuously over the last years. Since the start, users can access datasets anonymously via the website https://opendata.dwd.de to download file-based meteorological products. Free access and the variety of products has been welcomed by the general public as well as private met service providers. The more datasets are provided in a directory structure, however, the more tedious it is to find and select among all available data. Also, metadata and documentation were available, but on separate public websites. This turned out to be an issue, especially for new users of DWD's open data.</p><p>To help users explore the available datasets as well as to quickly decide on their suitability for a certain use case, the Open Data team at DWD is developing a geoportal. It enables free-text search along with combined access to data, metadata, and description along with interactive previews via OGC WMS.</p><p>Cloud technology is a suitable way forward for hosting the geoportal along with the data in its operational state. Benefits are expected for the easy integration of rich APIs with the geoportal, and the flexible and fast deployment and scaling of optional or prototypical services such as WMS-based previews. Flexibility is also mandatory to respond to fluctuating user demands, depending on time of day and critical weather situations, which is supported by containerization. The growing overall volume of meteorological data at DWD may mandate to allow customers to bring their code to the data – for on-demand processing including slicing and interpolation –  instead of transferring files to every customer. Shared cloud instances are the ideal interface for this purpose.</p><p>The contribution will outline a protoype version of the new geoportal and discuss further steps for launching it to the public.</p>


2019 ◽  
Vol 48 (D1) ◽  
pp. D882-D889 ◽  
Author(s):  
Yunhai Luo ◽  
Benjamin C Hitz ◽  
Idan Gabdank ◽  
Jason A Hilton ◽  
Meenakshi S Kagda ◽  
...  

Abstract The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.


Author(s):  
Vineet Talwar ◽  
Kundan Singh Chufal ◽  
Srujana Joga

AbstractArtificial intelligence (AI) has become an essential tool in human life because of its pivotal role in communications, transportation, media, and social networking. Inspired by the complex neuronal network and its functions in human beings, AI, using computer-based algorithms and training, had been explored since the 1950s. To tackle the enormous amount of patients' clinical data, imaging, histopathological data, and the increasing pace of research on new treatments and clinical trials, and ever-changing guidelines for treatment with the advent of novel drugs and evidence, AI is the need of the hour. There are numerous publications and active work on AI's role in the field of oncology. In this review, we discuss the fundamental terminology of AI, its applications in oncology on the whole, and its limitations. There is an inter-relationship between AI, machine learning and, deep learning. The virtual branch of AI deals with machine learning. While the physical branch of AI deals with the delivery of different forms of treatment—surgery, targeted drug delivery, and elderly care. The applications of AI in oncology include cancer screening, diagnosis (clinical, imaging, and histopathological), radiation therapy (image acquisition, tumor and organs at risk segmentation, image registration, planning, and delivery), prediction of treatment outcomes and toxicities, prediction of cancer cell sensitivity to therapeutics and clinical decision-making. A specific area of interest is in the development of effective drug combinations tailored to every patient and tumor with the help of AI. Radiomics, the new kid on the block, deals with the planning and administration of radiotherapy. As with any new invention, AI has its fallacies. The limitations include lack of external validation and proof of generalizability, difficulty in data access for rare diseases, ethical and legal issues, no precise logic behind the prediction, and last but not the least, lack of education and expertise among medical professionals. A collaboration between departments of clinical oncology, bioinformatics, and data sciences can help overcome these problems in the near future.


2019 ◽  
Author(s):  
S Bauermeister ◽  
C Orton ◽  
S Thompson ◽  
R A Barker ◽  
J R Bauermeister ◽  
...  

AbstractThe Dementias Platform UK (DPUK) Data Portal is a data repository facilitating access to data for 3 370 929 individuals in 42 cohorts. The Data Portal is an end-to-end data management solution providing a secure, fully auditable, remote access environment for the analysis of cohort data. All projects utilising the data are by default collaborations with the cohort research teams generating the data.The Data Portal uses UK Secure eResearch Platform (UKSeRP) infrastructure to provide three core utilities: data discovery, access, and analysis. These are delivered using a 7 layered architecture comprising: data ingestion, data curation, platform interoperability, data discovery, access brokerage, data analysis and knowledge preservation. Automated, streamlined, and standardised procedures reduce the administrative burden for all stakeholders, particularly for requests involving multiple independent datasets, where a single request may be forwarded to multiple data controllers. Researchers are provided with their own secure ‘lab’ using VMware which is accessed using two factor authentication.Over the last 2 years, 160 project proposals involving 579 individual cohort data access requests were received. These were received from 268 applicants spanning 72 institutions (56 academic, 13 commercial, 3 government) in 16 countries with 84 requests involving multiple cohorts. Project are varied including multi-modal, machine learning, and Mendelian randomisation analyses. Data access is usually free at point of use although a small number of cohorts require a data access fee.


2021 ◽  
Vol 50 (1) ◽  
pp. 15
Author(s):  
Matthias Reiter-Pázmándy

Open science and open access to research data are important aspects of research policy in Austria. In the last years, the social sciences have seen the building of research infrastructures that generate data and archives that store data. Data standards have been established, several working groups exist and a number of activities aim to further develop various aspects of open science, open data and access to data. However, some barriers and challenges still exist in the practice of sharing research data. One aspect that should be emphasised and incentivised is the re-use of research data.


Author(s):  
Yaser AbdulAali Jasim

Nowadays, technology and computer science are rapidly developing many tools and algorithms, especially in the field of artificial intelligence.  Machine learning is involved in the development of new methodologies and models that have become a novel machine learning area of applications for artificial intelligence. In addition to the architectures of conventional neural network methodologies, deep learning refers to the use of artificial neural network architectures which include multiple processing layers. In this paper, models of the Convolutional neural network were designed to detect (diagnose) plant disorders by applying samples of healthy and unhealthy plant images analyzed by means of methods of deep learning. The models were trained using an open data set containing (18,000) images of ten different plants, including healthy plants. Several model architectures have been trained to achieve the best performance of (97 percent) when the respectively [plant, disease] paired are detected. This is a very useful information or early warning technique and a method that can be further improved with the substantially high-performance rate to support an automated plant disease detection system to work in actual farm conditions.


2019 ◽  
pp. 683-711
Author(s):  
Zaffar Sadiq Mohamed-Ghouse ◽  
Cheryl Desha ◽  
Luis Perez-Mora

Abstract Australia must overcome a number of challenges to meet the needs of our growing population in a time of increased climate variability. Fortunately, we have unprecedented access to data about our land and the built environment that is internationally regarded for its quality. Over the last two decades Australia has risen to the forefront in developing and implementing Digital Earth concepts, with several key national initiatives formalising our digital geospatial journey in digital globes, open data access and ensuring data quality. In particular and in part driven by a lack of substantial resources in space, we have directed efforts towards world-leading innovation in big data processing and storage. This chapter highlights these geospatial initiatives, including case-uses, lessons learned, and next steps for Australia. Initiatives addressed include the National Data Grid (NDG), the Queensland Globe, G20 Globe, NSW Live (formerly NSW Globe), Geoscape, the National Map, the Australian Geoscience Data Cube and Digital Earth Australia. We explore several use cases and conclude by considering lessons learned that are transferrable for our colleagues internationally. This includes challenges in: 1) Creating an active context for data use, 2) Capacity building beyond ‘show-and-tell’, and 3) Defining the job market and demand for the market.


2021 ◽  
pp. 206-217
Author(s):  
Kieron O’Hara

The data provided by the Internet, plus the cloud-based computing power it allows, have helped develop machine learning (ML) and artificial intelligence (AI). Conversely, AI promises to unlock the value of the data being created. The ideology underlying Internet governance will have an effect on the flow of data and therefore AI. The Silicon Valley Open Internet favours open data, while the DC Commercial Internet allows rightsholders to monetize the data they have, implying returns to integration, while allowing privacy issues to be resolved by contract (privacy policies). The Beijing Paternal Internet provides other means for privately held data to be used in the national interest, while also supporting integration. The position is most complex with the Brussels Bourgeois Internet, where respect for human rights, exemplified by GDPR, makes it harder to accumulate data to train ML algorithms, and so may have a negative effect on the AI industry.


2019 ◽  
Vol 16 (1 (4)) ◽  
pp. 87-96
Author(s):  
Dominik Sybilski

The Central Public Information Repository (CRIP) was introduced into the legal system by the act of 16 September 2011 amending the act on access to public information. The main goal behind the introduction of CRIP was to support the economic exploitation of information. CRIP’s functions are implemented by the Ministry of Administration and Digitization launching an open public data portal (danepubliczne.gov.pl) in May 2014. So far, CRIP has appeared in the literature in the context of access to public information. However, there is a lack of works on CRIP as a national open data portal to distribute information for re-use. This issue in particular has become increasingly important in view of the adoption by the Council of Ministers of the Public Data Access Program, a government strategy dedicated to public policy in the area of open data. Furthermore, due to the entry into force of the Act of 25 February 2016 on the re-use of public sector information, there is an interesting issue about the scope of the CRIP. The article analyzes CRIP regulations in terms of their effectiveness for the implementation of the right to reuse and the policy of open data. The article concludes the findings of the analysis and attempts to propose de lege ferenda conclusion.  


Sign in / Sign up

Export Citation Format

Share Document