Reliable detection and characterization of low-frequency polarized sources in the LOFAR M51 field

Context. The new generation of broad-band radio continuum surveys will provide large data sets with polarization information. New algorithms need to be developed to extract reliable catalogs of linearly polarized sources that can be used to characterize those sources and produce a dense rotation measure (RM) grid to probe magneto-ionized structures along the line of sight via Faraday rotation. Aims. The aim of the paper is to develop a computationally efficient and rigorously defined source-finding algorithm for linearly polarized sources. Methods. We used a calibrated data set from the LOw Frequency ARray (LOFAR) at 150 MHz centered on the nearby galaxy M 51 to search for polarized background sources. With a new imaging software, we re-imaged the field at a resolution of 18″ × 15″ and cataloged a total of about 3000 continuum sources within 2.5° of the center of M 51. We made small Stokes Q and U images centered on each source brighter than 100 mJy in total intensity (201 sources) and used RM synthesis to create corresponding Faraday cubes that were analyzed individually. For each source, the noise distribution function was determined from a subset of the measurements at high Faraday depths where no polarization is expected; the peaks in polarized intensity in the Faraday spectrum were identified and the p-value of each source was calculated. Finally, the false discovery rate method was applied to the list of p-values to produce a list of polarized sources and quantify the reliability of the detections. We also analyzed sources fainter than 100 mJy but that were reported as polarized in the literature at at least another radio frequency. Results. Of the 201 sources that were searched for polarization, six polarized sources were detected confidently (with a false discovery rate of 5%). This corresponds to a number density of one polarized source per 3.3 square degrees, or 0.3 source per square degree. Increasing the false discovery rate to 50% yields 19 sources. A majority of the sources have a morphology that is indicative of them being double-lobed radio galaxies, and the ones with literature redshift measurements have 0.5 < z < 1.0. Conclusions. We find that this method is effective in identifying polarized sources, and is well suited for LOFAR observations. In the future, we intend to develop it further and apply it to larger data sets such as the LOFAR Two-meter Survey of the whole northern sky, LOTSS, and the ongoing deep LOFAR observations of the GOODS-North field.

Download Full-text

Multiparameter waveform inversion of a large wide-azimuth low-frequency land data set in Oman

Geophysics ◽

10.1190/geo2013-0323.1 ◽

2014 ◽

Vol 79 (3) ◽

pp. WA69-WA77 ◽

Cited By ~ 27

Author(s):

Alexandre Stopin ◽

René-Édouard Plessix ◽

Said Al Abri

Keyword(s):

Waveform Inversion ◽

Low Frequency ◽

Large Data ◽

Earth Model ◽

Data Sets ◽

Anisotropic Parameter ◽

Data Set ◽

Low Frequencies ◽

Reflected Waves ◽

Simultaneous Inversion

Several 3D seismic acoustic full-waveform inversions (FWIs) of offshore data sets have been reported over the last five years. A successful updating of the long-to-intermediate wavelengths of the earth model by FWI requires good-quality wide-angle, long-offset, low-frequency data. Recent improvements in acquisition make such data sets available on land, too. We evaluated a 3D application on a data set recorded in North Oman. The data contain low frequencies down to 1.5 Hz, long-offsets, and wide azimuths. The application of acoustic FWI on land remains complicated because of the elastic effects, notably the strong ground-roll and many acquisition and human-activity-related noises. The presence of fast carbonate layers in this region induces velocity inversions, difficult to recover from diving or postcritical waves. We accounted for anisotropic effects as we include FWI in a classical structural imaging workflow. With a dedicated processing of the data and a simultaneous inversion of the NMO velocity and the anelliptic-anisotropic parameter, we succeeded to interpret the kinematics of transmitted and reflected waves, although in the waveform inversion we included only the diving and postcritical waves. This approach has some limitations because of the acoustic assumption. We could not obtain a high-resolution image, especially at the shale-carbonate interfaces. There is also a trade-off between the NMO velocity and the anelliptic anisotropic parameter. However, the image improvements after acoustic FWI and the ability to handle the large data volume make this technique attractive in an imaging workflow.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

A batch-wise non-linear fitting and analysis tool for treating large X-ray diffraction data sets

Journal of Applied Crystallography ◽

10.1107/s0021889805035351 ◽

2006 ◽

Vol 39 (2) ◽

pp. 262-266 ◽

Cited By ~ 7

Author(s):

R. J. Davies

Keyword(s):

Diffraction Data ◽

Operation Mode ◽

Large Data ◽

Scattering Data ◽

Data Sets ◽

Analysis Tool ◽

Data Set ◽

X Ray ◽

Linear Fitting ◽

Non Linear

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.

Download Full-text

Effects of genotype and lactation number on health and reproductive problems in dairy cows

Proceedings of the British Society of Animal Science ◽

10.1017/s1752756200595842 ◽

1997 ◽

Vol 1997 ◽

pp. 143-143

Author(s):

B.L. Nielsen ◽

R.F. Veerkamp ◽

J.E. Pryce ◽

G. Simm ◽

J.D. Oldham

Keyword(s):

Dairy Cows ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Variation Analysis ◽

Genetic Line ◽

Data Set ◽

Health Events ◽

Use Of Data ◽

Low Incidence

High producing dairy cows have been found to be more susceptible to disease (Jones et al., 1994; Göhn et al., 1995) raising concerns about the welfare of the modern dairy cow. Genotype and number of lactations may affect various health problems differently, and their relative importance may vary. The categorical nature and low incidence of health events necessitates large data-sets, but the use of data collected across herds may introduce unwanted variation. Analysis of a comprehensive data-set from a single herd was carried out to investigate the effects of genetic line and lactation number on the incidence of various health and reproductive problems.

Download Full-text

Assessment of Damage Progression in Automotive Electronics Assemblies Subjected to Temperature and Vibration

ASME 2018 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems ◽

10.1115/ipack2018-8356 ◽

2018 ◽

Cited By ~ 1

Author(s):

Pradeep Lall ◽

Tony Thomas

Keyword(s):

Instantaneous Frequency ◽

Low Frequency ◽

Principal Component ◽

Large Data ◽

Data Sets ◽

Automotive Electronics ◽

Damage Progression ◽

Mode Decomposition ◽

Electronic Assemblies ◽

Strain Signal

Electronics in automotive underhood environments is used for a number of safety critical functions. Reliable continued operation of electronic safety systems without catastrophic failure is important for safe operation of the vehicle. There is need for prognostication methods, which can be integrated, with on-board sensors for assessment of accrued damage and impending failure. In this paper, leadfree electronic assemblies consisting of daisy-chained parts have been subjected to high temperature vibration at 5g and 155°C. Spectrogram has been used to identify the emergence of new low frequency components with damage progression in electronic assemblies. Principal component analysis has been used to reduce the dimensionality of large data-sets and identify patterns without the loss of features that signify damage progression and impending failure. Variance of the principal components of the instantaneous frequency has been shown to exhibit an increasing trend during the initial damage progression, attaining a maximum value and decreasing prior to failure. The unique behavior of the instantaneous frequency over the period of vibration can be used as a health-monitoring feature for identifying the impending failures in automotive electronics. Further, damage progression has been studied using Empirical Mode Decomposition (EMD) technique in order to decompose the signals into Independent Mode Functions (IMF). The IMF’s were investigated based on their kurtosis values and a reconstructed strain signal was formulated with all IMF’s greater than a kurtosis value of three. PCA analysis on the reconstructed strain signal gave better patterns that can be used for prognostication of the life of the components.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

SKT

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476287 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2369-2382

Author(s):

Monica Chiosa ◽

Thomas B. Preußer ◽

Gustavo Alonso

Keyword(s):

Frequency Distribution ◽

Empirical Evaluation ◽

Large Data ◽

Cloud Service ◽

Data Sets ◽

Data Set ◽

Single Pass ◽

Trade Offs ◽

Significant Performance ◽

Spatial Architecture

Data analysts often need to characterize a data stream as a first step to its further processing. Some of the initial insights to be gained include, e.g., the cardinality of the data set and its frequency distribution. Such information is typically extracted by using sketch algorithms, now widely employed to process very large data sets in manageable space and in a single pass over the data. Often, analysts need more than one parameter to characterize the stream. However, computing multiple sketches becomes expensive even when using high-end CPUs. Exploiting the increasing adoption of hardware accelerators, this paper proposes SKT , an FPGA-based accelerator that can compute several sketches along with basic statistics (average, max, min, etc.) in a single pass over the data. SKT has been designed to characterize a data set by calculating its cardinality, its second frequency moment, and its frequency distribution. The design processes data streams coming either from PCIe or TCP/IP, and it is built to fit emerging cloud service architectures, such as Microsoft's Catapult or Amazon's AQUA. The paper explores the trade-offs of designing sketch algorithms on a spatial architecture and how to combine several sketch algorithms into a single design. The empirical evaluation shows how SKT on an FPGA offers a significant performance gain over high-end, server-class CPUs.

Download Full-text

Scalable Non-Parametric Methods for Large Data Sets

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch260 ◽

2011 ◽

pp. 1708-1713

Author(s):

V. Suresh Babu ◽

P. Viswanath ◽

Narasimha M. Murty

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Parametric Methods ◽

Clustering Method ◽

Data Set ◽

Computational Burden ◽

Set Size ◽

Non Parametric

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.

Download Full-text

Learning mid-IR emission spectra of PAH populations from observations

Proceedings of the International Astronomical Union ◽

10.1017/s1743921319007786 ◽

2019 ◽

Vol 15 (S350) ◽

pp. 406-407

Author(s):

Sacha Foschino ◽

Olivier Berné ◽

Christine Joblin

Keyword(s):

Emission Spectra ◽

Large Data ◽

Blind Signal Separation ◽

Data Sets ◽

Data Set ◽

Blind Signal ◽

James Webb Space Telescope ◽

Astrophysical Objects ◽

Polycyclic Aromatic ◽

Ir Emission

AbstractObservations of the mid-infrared (mid-IR, 3-15 μm) spectra of photo-dissociation regions reveal ubiquitous, broad and intense emission bands, the aromatic infrared bands (AIBs), attributed to polycyclic aromatic hydrocarbons (PAHs). Studies of the AIBs showed spectral variations (e.g. in the band positions) between different astrophysical objects, or even within single object, thanks to hyperspectral images. The James Webb Space Telescope (JWST) will allow to get further spectral and spatial details compared to former space observatories. This will come with large data sets, which will require specific tools in order to perform efficient scientific analysis.We propose in this study a method based on blind signal separation to reduce the analysis of such large data set to that of a small number of elementary spectra, spectrally representative of the data set and physically interpretable as the spectra of populations of mid-IR emitters. The robustness and fastness of the method are improved compared to former algorithms. It is tested on a ISO-SWS data set, which approaches the best the characteristics of JWST data, from which four elementary spectra are extracted, attributed to cationic, neutral PAHs, evaporating very small grains and large and ionized PAHs.

Download Full-text