Science Pipelines for the Square Kilometre Array

The Square Kilometre Array (SKA) will be both the largest radio telescope ever constructed and the largest Big Data project in the known Universe. The first phase of the project will generate on the order of five zettabytes of data per year. A critical task for the SKA will be its ability to process data for science, which will need to be conducted by science pipelines. Together with polarization data from the LOFAR Multifrequency Snapshot Sky Survey (MSSS), we have been developing a realistic SKA-like science pipeline that can handle the large data volumes generated by LOFAR at 150 MHz. The pipeline uses task-based parallelism to image, detect sources and perform Faraday tomography across the entire LOFAR sky. The project thereby provides a unique opportunity to contribute to the technological development of the SKA telescope, while simultaneously enabling cutting-edge scientific results. In this paper, we provide an update on current efforts to develop a science pipeline that can enable tight constraints on the magnetised large-scale structure of the Universe.

Download Full-text

Science Pipelines for the Square Kilometre Array

10.20944/preprints201810.0115.v1 ◽

2018 ◽

Author(s):

Jamie Farnes ◽

Ben Mort ◽

Fred Dulwich ◽

Stef Salvini ◽

Wes Armour

Keyword(s):

Big Data ◽

Large Scale ◽

Technological Development ◽

Large Data ◽

Process Data ◽

Polarization Data ◽

Square Kilometre Array ◽

Sky Survey ◽

Scientific Results ◽

The Universe

The Square Kilometre Array (SKA) will be both the largest radio telescope ever constructed and the largest Big Data project in the known Universe. The first phase of the project will generate on the order of 5 zettabytes of data per year. A critical task for the SKA will be its ability to process data for science, which will need to be conducted by science pipelines. Together with polarization data from the LOFAR Multifrequency Snapshot Sky Survey (MSSS), we have been developing a realistic SKA-like science pipeline that can handle the large data volumes generated by LOFAR at 150 MHz. The pipeline uses task-based parallelism to image, detect sources, and perform Faraday Tomography across the entire LOFAR sky. The project thereby provides a unique opportunity to contribute to the technological development of the SKA telescope, while simultaneously enabling cutting-edge scientific results. In this paper, we provide an update on current efforts to develop a science pipeline that can enable tight constraints on the magnetised large-scale structure of the Universe.

Download Full-text

Science Pipelines for the Square Kilometre Array

10.20944/preprints201810.0115.v2 ◽

2018 ◽

Author(s):

Jamie Farnes ◽

Ben Mort ◽

Fred Dulwich ◽

Stef Salvini ◽

Wes Armour

Keyword(s):

Big Data ◽

Large Scale ◽

Technological Development ◽

Large Data ◽

Process Data ◽

Polarization Data ◽

Square Kilometre Array ◽

Sky Survey ◽

Scientific Results ◽

The Universe

The Square Kilometre Array (SKA) will be both the largest radio telescope ever constructed and the largest Big Data project in the known Universe. The first phase of the project will generate on the order of 5 zettabytes of data per year. A critical task for the SKA will be its ability to process data for science, which will need to be conducted by science pipelines. Together with polarization data from the LOFAR Multifrequency Snapshot Sky Survey (MSSS), we have been developing a realistic SKA-like science pipeline that can handle the large data volumes generated by LOFAR at 150 MHz. The pipeline uses task-based parallelism to image, detect sources, and perform Faraday Tomography across the entire LOFAR sky. The project thereby provides a unique opportunity to contribute to the technological development of the SKA telescope, while simultaneously enabling cutting-edge scientific results. In this paper, we provide an update on current efforts to develop a science pipeline that can enable tight constraints on the magnetised large-scale structure of the Universe.

Download Full-text

Prospects for Cooperation Between Russia and the Republic of Korea in the Fields of Big Data, Network and Artificial Intelligence for the Development of the Digital Economy

World Economy and International Relations ◽

10.20542/0131-2227-2021-65-8-51-60 ◽

2021 ◽

Vol 65 (8) ◽

pp. 51-60

Author(s):

Yujeong Kim

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Large Scale ◽

Technology Development ◽

Technological Development ◽

Digital Economy ◽

Republic Of Korea ◽

Data Network ◽

Leading Position ◽

The Government

Today, each country has interest in digital economy and has established and implemented policies aimed at digital technology development and digital transformation for the transition to the digital economy. In particular, interest in digital technologies such as big data, 5G, and artificial intelligence, which are recognized as important factors in the digital economy, has been increasing recently, and it is a time when the role of the government for technological development and international cooperation becomes important. In addition to the overall digital economic policy, the Russian and Korean governments are also trying to improve their international competitiveness and take a leading position in the new economic order by establishing related technical and industrial policies. Moreover, Republic of Korea often refers to data, network and artificial intelligence as D∙N∙A, and has established policies in each of these areas in 2019. Russia is also establishing and implementing policies in the same field in 2019. Therefore, it is timely to find ways to expand cooperation between Russia and Republic of Korea. In particular, the years of 2020and 2021marks the 30th anniversary of diplomatic relations between the two countries, and not only large-scale events and exchange programs have prepared, but the relationship is deepening as part of the continued foreign policy of both countries – Russia’s Eastern Policy and New Northern Policy of Republic of Korea. Therefore, this paper compares and analyzes the policies of the two countries in big data, 5G, and artificial intelligence to seek long-term sustainable cooperation in the digital economy.

Download Full-text

A Systematic Analysis of Big Image Data Methodologies in Various Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2307.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 483-487

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Scale ◽

Image Data ◽

Computational Time ◽

Process Data ◽

Systematic Analysis ◽

Large Scale Data ◽

Learning Techniques ◽

Effective Performance

Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Large Scale Structure of the Universe

Symposium - International Astronomical Union ◽

10.1017/s0074180900128906 ◽

1998 ◽

Vol 179 ◽

pp. 317-328 ◽

Cited By ~ 2

Author(s):

N.A. Bahcall

Keyword(s):

Large Scale ◽

Initial Conditions ◽

Large Scale Structure ◽

Density Fluctuations ◽

Scale Structure ◽

Sloan Digital Sky Survey ◽

Sky Survey ◽

Initial Spectrum ◽

Formation And Evolution ◽

The Universe

How is the universe organized on large scales? How did this structure evolve from the unknown initial conditions of a rather smooth early universe to the present time? The answers to these questions will shed light on the cosmology we live in, the amount, composition and distribution of matter in the universe, the initial spectrum of density fluctuations that gave rise to this structure, and the formation and evolution of galaxies, lusters of galaxies, and larger scale structures.To address these fundamental questions, large and accurate sky surveys are needed—in various wavelengths and to various depths. In this presentation I review current observational studies of large scale structure, present the constraints these observations place on cosmological models and on the amount of dark matter in the universe, and highlight some of the main unsolved problems in the field of large-scale structure that could be solved over the next decade with the aid of current and future surveys. I briefly discuss some of these surveys, including the Sloan Digital Sky Survey that will provide a complete imaging and spectroscopic survey of the high-latitude northern sky, with redshifts for the brightest ∼ 106 galaxies, 105 quasars, and 103.5 rich clusters of galaxies. The potentialities of the SDSS survey, as well as of cross-wavelength surveys, for resolving some of the unsolved problems in large-scale structure and cosmology are discussed.

Download Full-text

One- and two-point source statistics from the LOFAR Two-metre Sky Survey first data release

Astronomy and Astrophysics ◽

10.1051/0004-6361/201936592 ◽

2020 ◽

Vol 643 ◽

pp. A100

Author(s):

T. M. Siewert ◽

C. Hale ◽

N. Bhardwaj ◽

M. Biermann ◽

D. J. Bacon ◽

...

Keyword(s):

Point Source ◽

Flux Density ◽

Large Scale ◽

Value Added ◽

Radio Sources ◽

Sky Survey ◽

Point Correlation ◽

Point Correlation Function ◽

The Universe ◽

Density Threshold

Context. The LOFAR Two-metre Sky Survey (LoTSS) will eventually map the complete Northern sky and provide an excellent opportunity to study the distribution and evolution of the large-scale structure of the Universe. Aims. We test the quality of LoTSS observations through a statistical comparison of the LoTSS first data release (DR1) catalogues to expectations from the established cosmological model of a statistically isotropic and homogeneous Universe. Methods. We study the point-source completeness and define several quality cuts, in order to determine the count-in-cell statistics and differential source count statistics, and measure the angular two-point correlation function. We use the photometric redshift estimates, which are available for about half of the LoTSS-DR1 radio sources, to compare the clustering throughout the history of the Universe. Results. For the masked LoTSS-DR1 value-added source catalogue, we find a point-source completeness of 99% above flux densities of 0.8 mJy. The counts-in-cell statistic reveals that the distribution of radio sources cannot be described by a spatial Poisson process. Instead, a good fit is provided by a compound Poisson distribution. The differential source counts are in good agreement with previous findings in deep fields at low radio frequencies and with simulated catalogues from the SKA Design Study and the Tiered Radio Extragalactic Continuum Simulation. Restricting the value added source catalogue to low-noise regions and applying a flux density threshold of 2 mJy provides our most reliable estimate of the angular two-point correlation. Based on the distribution of photometric redshifts and the Planck 2018 best-fit cosmological model, the theoretically predicted angular two-point correlation between 0.1 deg and 6 deg agrees reasonably well with the measured clustering for the sub-sample of radio sources with redshift information. Conclusions. The deviation from a Poissonian distribution might be a consequence of the multi-component nature of a large number of resolved radio sources and/or of uncertainties on the flux density calibration. The angular two-point correlation function is < 10−2 at angular scales > 1 deg and up to the largest scales probed. At a 2 mJy flux density threshold and at a pivot angle of 1 deg, we find a clustering amplitude of A = (5.1 ± 0.6) × 10−3 with a slope parameter of γ = 0.74 ± 0.16. For smaller flux density thresholds, systematic issues are identified, which are most likely related to the flux density calibration of the individual pointings. We conclude that we find agreement with the expectation of large-scale statistical isotropy of the radio sky at the per cent level. The angular two-point correlation agrees well with the expectation of the cosmological standard model.

Download Full-text

Characterising filaments in the SDSS volume from the galaxy distribution

Astronomy and Astrophysics ◽

10.1051/0004-6361/202037647 ◽

2020 ◽

Vol 642 ◽

pp. A19 ◽

Cited By ~ 2

Author(s):

Nicola Malavasi ◽

Nabila Aghanim ◽

Marian Douspis ◽

Hideki Tanimura ◽

Victor Bonjean

Keyword(s):

Large Scale ◽

Sloan Digital Sky Survey ◽

Specific Method ◽

Redshift Surveys ◽

Galaxy Distribution ◽

Systematic Uncertainties ◽

Sky Survey ◽

The Galaxy ◽

Free Parameters ◽

The Universe

Detecting the large-scale structure of the Universe based on the galaxy distribution and characterising its components is of fundamental importance in astrophysics but is also a difficult task to achieve. Wide-area spectroscopic redshift surveys are required to accurately measure galaxy positions in space that also need to cover large areas of the sky. It is also difficult to create algorithms that can extract cosmic web structures (e.g. filaments). Moreover, these detections will be affected by systematic uncertainties that stem from the characteristics of the survey used (e.g. its completeness and coverage) and from the unique properties of the specific method adopted to detect the cosmic web (i.e. the assumptions it relies on and the free parameters it may employ). For these reasons, the creation of new catalogues of cosmic web features on wide sky areas is important, as this allows users to have at their disposal a well-understood sample of structures whose systematic uncertainties have been thoroughly investigated. In this paper we present the filament catalogues created using the discrete persistent structure extractor tool in the Sloan Digital Sky Survey (SDSS), and we fully characterise them in terms of their dependence on the choice of parameters pertaining to the algorithm, and with respect to several systematic issues that may arise in the skeleton as a result of the properties of the galaxy distribution (such as Finger-of-God redshift distortions and defects of the density field that are due to the boundaries of the survey).

Download Full-text

Galaxy clusters as intrinsic alignment tracers: present and future

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3633 ◽

2020 ◽

Vol 500 (4) ◽

pp. 5561-5569

Author(s):

C J G Vedder ◽

N E Chisari

Keyword(s):

Large Scale ◽

Signal To Noise Ratio ◽

Sloan Digital Sky Survey ◽

Line Of Sight ◽

Expected Number ◽

Number Count ◽

Sky Survey ◽

Redshift Survey ◽

Number Counts ◽

The Universe

ABSTRACT Galaxies and clusters embedded in the large-scale structure of the Universe are observed to align in preferential directions. Galaxy alignment has been established as a potential probe for cosmological information, but the application of cluster alignments for these purposes remains unexplored. Clusters are observed to have a higher alignment amplitude than galaxies, but because galaxies are much more numerous, the trade-off in detectability between the two signals remains unclear. We present forecasts comparing cluster and galaxy alignments for two extragalactic survey set-ups: a currently available low-redshift survey (Sloan Digital Sky Survey, SDSS) and an upcoming higher redshift survey (Legacy Survey of Space and Time, LSST). For SDSS, we rely on the publicly available redmapper catalogue to describe the cluster sample. For LSST, we perform estimations of the expected number counts while we extrapolate the alignment measurements from SDSS. Clusters in SDSS have typically higher alignment signal-to-noise ratio (S/N) than galaxies. For LSST, the cluster alignment signals quickly wash out with redshift due to a relatively low number count and a decreasing alignment amplitude. Nevertheless, a potential strong suit of clusters is in their interplay with weak lensing: intrinsic alignments can be more easily isolated for clusters than for galaxies. The S/N of cluster alignment can in general be improved by isolating close pairs along the line of sight.

Download Full-text

Sherlock: an open-source data platform to store, analyze and integrate Big Data for biology

F1000Research ◽

10.12688/f1000research.52791.1 ◽

2021 ◽

Vol 10 ◽

pp. 409

Author(s):

Balázs Bohár ◽

David Fazekas ◽

Matthew Madgwick ◽

Luca Csabai ◽

Marton Olbei ◽

...

Keyword(s):

Big Data ◽

Data Management ◽

Open Source ◽

Large Scale ◽

Genomic Sequence ◽

Large Data ◽

Structured Data ◽

Biological Research ◽

Data Platform ◽

Big Data Technologies

In the era of Big Data, data collection underpins biological research more so than ever before. In many cases this can be as time-consuming as the analysis itself, requiring downloading multiple different public databases, with different data structures, and in general, spending days before answering any biological questions. To solve this problem, we introduce an open-source, cloud-based big data platform, called Sherlock (https://earlham-sherlock.github.io/). Sherlock provides a gap-filling way for biologists to store, convert, query, share and generate biology data, while ultimately streamlining bioinformatics data management. The Sherlock platform provides a simple interface to leverage big data technologies, such as Docker and PrestoDB. Sherlock is designed to analyse, process, query and extract the information from extremely complex and large data sets. Furthermore, Sherlock is capable of handling different structured data (interaction, localization, or genomic sequence) from several sources and converting them to a common optimized storage format, for example to the Optimized Row Columnar (ORC). This format facilitates Sherlock’s ability to quickly and easily execute distributed analytical queries on extremely large data files as well as share datasets between teams. The Sherlock platform is freely available on Github, and contains specific loader scripts for structured data sources of genomics, interaction and expression databases. With these loader scripts, users are able to easily and quickly create and work with the specific file formats, such as JavaScript Object Notation (JSON) or ORC. For computational biology and large-scale bioinformatics projects, Sherlock provides an open-source platform empowering data management, data analytics, data integration and collaboration through modern big data technologies.

Download Full-text