Letter: Twinkle, Twinkle Little STAR, How I Wonder What You Are: The Case for High-Quality, Large-Scale, “Real-World” Databases

Abstract Many real world data and processes have a network structure and can usefully be represented as graphs. Network analysis focuses on the relations among the nodes exploring the properties of each network. We introduce a method for measuring the strength of the relationship between two nodes of a network and for their ranking. This method is applicable to all kinds of networks, including directed and weighted networks. The approach extracts dependency relations among the network’s nodes from the structure in local surroundings of individual nodes. For the tasks we deal with in this article, the key technical parameter is locality. Since only the surroundings of the examined nodes are used in computations, there is no need to analyze the entire network. This allows the application of our approach in the area of large-scale networks. We present several experiments using small networks as well as large-scale artificial and real world networks. The results of the experiments show high effectiveness due to the locality of our approach and also high quality node ranking comparable to PageRank.

Download Full-text

OPACITY-BASED EDGE HIGHLIGHTING FOR TRANSPARENT VISUALIZATION OF 3D SCANNED POINT CLOUDS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-373-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 373-380

Author(s):

K. Kawakami ◽

K. Hasegawa ◽

L. Li ◽

H. Nagata ◽

M. Adachi ◽

...

Keyword(s):

Real World ◽

Point Cloud ◽

Large Scale ◽

3D Structure ◽

Point Clouds ◽

High Quality ◽

3D Images ◽

3D Structures ◽

3D Objects ◽

Scale Point

Abstract. The recent development of 3D scanning technologies has made it possible to quickly and accurately record various 3D objects in the real world. The 3D scanned data take the form of large-scale point clouds, which describe complex 3D structures of the target objects and the surrounding scenes. The complexity becomes significant in cases that a scanned object has internal 3D structures, and the acquired point cloud is created by merging the scanning results of both the interior and surface shapes. To observe the whole 3D structure of such complex point-based objects, the point-based transparent visualization, which we recently proposed, is useful because we can observe the internal 3D structures as well as the surface shapes based on high-quality see-through 3D images. However, transparent visualization sometimes shows us too much information so that the generated images become confusing. To address this problem, in this paper, we propose to combine “edge highlighting” with transparent visualization. This combination makes the created see-through images quite understandable because we can highlight the 3D edges of visualized shapes as high-curvature areas. In addition, to make the combination more effective, we propose a new edge highlighting method applicable to 3D scanned point clouds. We call the method “opacity-based edge highlighting,” which appropriately utilizes the effect of transparency to make the 3D edge regions look clearer. The proposed method works well for both sharp (high-curvature) and soft (low-curvature) 3D edges. We show several experiments that demonstrate our method’s effectiveness by using real 3D scanned point clouds.

Download Full-text

Weakly Supervised Spatial Deep Learning for Earth Image Segmentation Based on Imperfect Polyline Labels

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3480970 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-20

Author(s):

Zhe Jiang ◽

Wenchong He ◽

Marcus Stephen Kirby ◽

Arpan Man Sainju ◽

Shaowen Wang ◽

...

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Real World ◽

Large Scale ◽

Model Parameters ◽

Vector Representation ◽

High Quality ◽

Geometric Properties ◽

Location Errors ◽

Weakly Supervised

In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy.

Download Full-text

1588-P: Therapy Trends in Initial 6 Months of the First Large-Scale Longitudinal Nationwide Study on Management and Real-World Outcomes of Diabetes in India (LANDMARC)

Diabetes ◽

10.2337/db20-1588-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 1588-P ◽

Cited By ~ 1

Author(s):

ROMIK GHOSH ◽

ASHOK K. DAS ◽

AMBRISH MITHAL ◽

SHASHANK JOSHI ◽

K.M. PRASANNA KUMAR ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Nationwide Study

Download Full-text

2258-PUB: Impact of Cardiovascular (CV) Risk Factors during the Initial 6 Months of the First Large-Scale Longitudinal Nationwide Study on Management and Real-World Outcomes of Diabetes in India (LANDMARC)

Diabetes ◽

10.2337/db20-2258-pub ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 2258-PUB

Author(s):

ROMIK GHOSH ◽

ASHOK K. DAS ◽

SHASHANK JOSHI ◽

AMBRISH MITHAL ◽

K.M. PRASANNA KUMAR ◽

...

Keyword(s):

Risk Factors ◽

Real World ◽

Large Scale ◽

Nationwide Study

Download Full-text

The graph neural networking challenge

ACM SIGCOMM Computer Communication Review ◽

10.1145/3477482.3477485 ◽

2021 ◽

Vol 51 (3) ◽

pp. 9-16

Author(s):

José Suárez-Varela ◽

Miquel Ferriol-Galmés ◽

Albert López ◽

Paul Almasan ◽

Guillermo Bernárdez ◽

...

Keyword(s):

Machine Learning ◽

Computer Networks ◽

Real World ◽

Large Scale ◽

Lessons Learned ◽

Educational Resources ◽

Global Competition ◽

International Telecommunication Union ◽

International Telecommunication ◽

Broad Audience

During the last decade, Machine Learning (ML) has increasingly become a hot topic in the field of Computer Networks and is expected to be gradually adopted for a plethora of control, monitoring and management tasks in real-world deployments. This poses the need to count on new generations of students, researchers and practitioners with a solid background in ML applied to networks. During 2020, the International Telecommunication Union (ITU) has organized the "ITU AI/ML in 5G challenge", an open global competition that has introduced to a broad audience some of the current main challenges in ML for networks. This large-scale initiative has gathered 23 different challenges proposed by network operators, equipment manufacturers and academia, and has attracted a total of 1300+ participants from 60+ countries. This paper narrates our experience organizing one of the proposed challenges: the "Graph Neural Networking Challenge 2020". We describe the problem presented to participants, the tools and resources provided, some organization aspects and participation statistics, an outline of the top-3 awarded solutions, and a summary with some lessons learned during all this journey. As a result, this challenge leaves a curated set of educational resources openly available to anyone interested in the topic.

Download Full-text

Tiered Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441299 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-52

Author(s):

Lorenzo De Stefani ◽

Erisa Terolli ◽

Eli Upfal

Keyword(s):

Large Scale ◽

Analysis Of Algorithms ◽

Base Layer ◽

Single Edge ◽

Real World Data ◽

High Quality ◽

Large Graphs ◽

Massive Graphs ◽

Variance Estimate ◽

Low Probability

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Download Full-text

Large-Scale and Deep-Seated Gravitational Slope Deformations on Mars: A Review

Geosciences ◽

10.3390/geosciences11040174 ◽

2021 ◽

Vol 11 (4) ◽

pp. 174

Author(s):

Marco Emanuele Discenza ◽

Carlo Esposito ◽

Goro Komatsu ◽

Enrico Miccadei

Keyword(s):

Large Scale ◽

High Quality ◽

Surface Data ◽

Geomorphological Processes ◽

Gravitational Processes ◽

Quality Surface ◽

High Quality Surface ◽

Mars Missions

The availability of high-quality surface data acquired by recent Mars missions and the development of increasingly accurate methods for analysis have made it possible to identify, describe, and analyze many geological and geomorphological processes previously unknown or unstudied on Mars. Among these, the slow and large-scale slope deformational phenomena, generally known as Deep-Seated Gravitational Slope Deformations (DSGSDs), are of particular interest. Since the early 2000s, several studies were conducted in order to identify and analyze Martian large-scale gravitational processes. Similar to what happens on Earth, these phenomena apparently occur in diverse morpho-structural conditions on Mars. Nevertheless, the difficulty of directly studying geological, structural, and geomorphological characteristics of the planet makes the analysis of these phenomena particularly complex, leaving numerous questions to be answered. This paper reports a synthesis of all the known studies conducted on large-scale deformational processes on Mars to date, in order to provide a complete and exhaustive picture of the phenomena. After the synthesis of the literature studies, the specific characteristics of the phenomena are analyzed, and the remaining main open issued are described.

Download Full-text