Decoding the Cauzin Softstrip: a case study in extracting information from old media

AbstractHaving content in an archive is of limited value if it cannot be read and used. As a case study of extricating information from obsolete media, making it readable once again through deep learning techniques, we examine the Cauzin Softstrip: one of the first two-dimensional bar codes, released in 1985 by Cauzin Systems, which could be used for encoding all manner of digital data. Softstrips occupy a curious middle ground, as they were both physical and digital. The bar codes were printed on paper, and in that sense are no different in an archival way than any printed material. Softstrips can be found in old computer magazines, computer books, and booklets of software Cauzin produced. However, managing the digital nature of these physical artifacts falls within the scope of digital curation. To make the information on them readable and useful, the digital information needs to be extracted, which originally would have occurred using a physical Cauzin Softstrip reader. Obtaining a working Softstrip reader is already extremely difficult and will most likely be impossible in the coming years. In order to extract the encoded data, we created a digital Softstrip reader, making Softstrip data accessible without needing a physical reader. Our decoding strategy is able to decode over 91% of the 1229 Softstrips in our Softstrip corpus; this rises to 99% if we only consider Softstrip images produced under controlled conditions. Furthermore, we later acquired another set of 117 Softstrips and we were able to decode nearly 95% of them with no adjustments to the decoder. These excellent results underscore the fact that technology like deep learning is readily accessible to non-experts; we obtained these results using a convolutional neural network, even though neither of the authors are expert in the area.

Download Full-text

Digital curation: the development of a discipline within information science

Journal of Documentation ◽

10.1108/jd-02-2018-0024 ◽

2018 ◽

Vol 74 (6) ◽

pp. 1318-1338 ◽

Cited By ~ 2

Author(s):

Sarah Higgins

Keyword(s):

Information Science ◽

Theoretical Models ◽

Digital Information ◽

Content Type ◽

Digital Curation ◽

The Usa ◽

The Uk ◽

Financial Ecology

Purpose Digital curation addresses the technical, administrative and financial ecology required to ensure that digital information remains accessible and usable over the long term. The purpose of this paper is to trace digital curation’s disciplinary emergence and examine its position within the information sciences domain in terms of theoretical principles, using a case study of developments in the UK and the USA. Design/methodology/approach Theoretical principles regarding disciplinary development and the identity of information science as a discipline are applied to a case study of the development of digital curation in the UK and the USA to identify the maturity of digital curation and its position in the information science gamut. Findings Digital curation is identified as a mature discipline which is a sub-meta-discipline of information science. As such digital curation has reach across all disciplines and sub-disciplines of information science and has the potential to become the overarching paradigm. Practical implications These findings could influence digital curation’s development from applied discipline to profession within both its educational and professional domains. Originality/value The disciplinary development of digital curation within dominant theoretical models has not hitherto been articulated.

Download Full-text

Privacy Preserving Machine Learning and Deep Learning Techniques

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch012 ◽

2020 ◽

pp. 222-235

Author(s):

Divya Asok ◽

Chitra P. ◽

Bharathiraja Muthurajan

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Privacy ◽

Differential Privacy ◽

Digital Data ◽

Sensitive Data ◽

Security Issues ◽

Private Data ◽

Learning Techniques ◽

Machine Learning Applications

In the past years, the usage of internet and quantity of digital data generated by large organizations, firms, and governments have paved the way for the researchers to focus on security issues of private data. This collected data is usually related to a definite necessity. For example, in the medical field, health record systems are used for the exchange of medical data. In addition to services based on users' current location, many potential services rely on users' location history or their spatial-temporal provenance. However, most of the collected data contain data identifying individual which is sensitive. With the increase of machine learning applications around every corner of the society, it could significantly contribute to the preservation of privacy of both individuals and institutions. This chapter gives a wider perspective on the current literature on privacy ML and deep learning techniques, along with the non-cryptographic differential privacy approach for ensuring sensitive data privacy.

Download Full-text

Automatic Post-Disaster Damage Mapping Using Deep-Learning Techniques for Change Detection: Case Study of the Tohoku Tsunami

Remote Sensing ◽

10.3390/rs11091123 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1123 ◽

Cited By ~ 14

Author(s):

Jérémie Sublime ◽

Ekaterina Kalinicheva

Keyword(s):

Deep Learning ◽

Change Detection ◽

Machine Learning Methods ◽

Learning Techniques ◽

Before And After ◽

Damage Mapping ◽

Tohoku Tsunami ◽

Post Disaster ◽

Speed Of Analysis

Post-disaster damage mapping is an essential task following tragic events such as hurricanes, earthquakes, and tsunamis. It is also a time-consuming and risky task that still often requires the sending of experts on the ground to meticulously map and assess the damages. Presently, the increasing number of remote-sensing satellites taking pictures of Earth on a regular basis with programs such as Sentinel, ASTER, or Landsat makes it easy to acquire almost in real time images from areas struck by a disaster before and after it hits. While the manual study of such images is also a tedious task, progress in artificial intelligence and in particular deep-learning techniques makes it possible to analyze such images to quickly detect areas that have been flooded or destroyed. From there, it is possible to evaluate both the extent and the severity of the damages. In this paper, we present a state-of-the-art deep-learning approach for change detection applied to satellite images taken before and after the Tohoku tsunami of 2011. We compare our approach with other machine-learning methods and show that our approach is superior to existing techniques due to its unsupervised nature, good performance, and relative speed of analysis.

Download Full-text

DTM-based landslide detection using deep learning: A case study in Hong Kong

10.5194/egusphere-egu2020-4090 ◽

2020 ◽

Author(s):

Haojie Wang ◽

Limin Zhang

Keyword(s):

Hong Kong ◽

Deep Learning ◽

Dimensional Space ◽

Three Dimensional ◽

Digital Terrain Model ◽

Landslide Inventory ◽

Landslide Risk ◽

Visual Interpretation ◽

Learning Techniques

<p>Landslide detection is an essential component of landslide risk assessment and hazard mitigation. It can be used to produce landslide inventories which are considered as one of the fundamental auxiliary data for regional landslide susceptibility analysis. In order to achieve high landslide interpretation accuracy, visual interpretation is frequently used, but suffers in time efficiency and labour demand. Hence, an automatic landslide detection method utilizing deep learning techniques is implemented in this work to conduct high-accuracy and fast landslide interpretation. As the ground characteristics and terrain features can precisely capture the three-dimensional space form of landslides, high-resolution digital terrain model (DTM) is taken as the data source for landslide detection. A case study in Hong Kong, China is conducted to validate the applicability of deep learning techniques in landslide detection. The case study takes multiple data layers derived from the DTM (e.g., elevation, slope gradient, aspect, etc.) and a local landslide inventory named enhanced natural terrain landslide inventory (ENTLI) as its data sources, and integrates them into a database for learning. Then, a deep learning technique (e.g., convolutional neural network) is used to train models on the database and perform landslide detection. Results of the case study show great performance and capacity of the applied deep learning techniques, which provides valuable references for advancing landslide detection.</p>

Download Full-text

Design of Specific Primer Sets for the Detection of B.1.1.7, B.1.351 and P.1 SARS-CoV-2 Variants using Deep Learning

10.1101/2021.01.20.427043 ◽

2021 ◽

Author(s):

Alejandro Lopez-Rincon ◽

Carmina A. Perez-Romero ◽

Alberto Tonda ◽

Lucero Mendoza-Maldonado ◽

Eric Claassen ◽

...

Keyword(s):

Deep Learning ◽

South African ◽

Synonymous Mutations ◽

Molecular Tests ◽

Learning Techniques ◽

New Variant ◽

Primer Sets ◽

Global Initiative ◽

The Uk

ABSTRACTAs the COVID-19 pandemic persists, new SARS-CoV-2 variants with potentially dangerous features have been identified by the scientific community. Variant B.1.1.7 lineage clade GR from Global Initiative on Sharing All Influenza Data (GISAID) was first detected in the UK, and it appears to possess an increased transmissibility. At the same time, South African authorities reported variant B.1.351, that shares several mutations with B.1.1.7, and might also present high transmissibility. Even more recently, a variant labeled P.1 with 17 non-synonymous mutations was detected in Brazil. In such a situation, it is paramount to rapidly develop specific molecular tests to uniquely identify, contain, and study new variants. Using a completely automated pipeline built around deep learning techniques, we design primer sets specific to variant B.1.1.7, B.1.351, and P.1, respectively. Starting from sequences openly available in the GISAID repository, our pipeline was able to deliver the primer sets in just under 16 hours for each case study. In-silico tests show that the sequences in the primer sets present high accuracy and do not appear in samples from different viruses, nor in other coronaviruses or SARS-CoV-2 variants. The presented methodology can be exploited to swiftly obtain primer sets for each independent new variant, that can later be a part of a multiplexed approach for the initial diagnosis of COVID-19 patients. Furthermore, since our approach delivers primers able to differentiate between variants, it can be used as a second step of a diagnosis in cases already positive to COVID-19, to identify individuals carrying variants with potentially threatening features.

Download Full-text

Artificial Intelligence-Enabled and Period-Aware Forecasting COVID-19 Spread

Ingénierie des systèmes d information ◽

10.18280/isi.260105 ◽

2021 ◽

Vol 26 (1) ◽

pp. 47-57

Author(s):

Paul Menounga Mbilong ◽

Asmae Berhich ◽

Imane Jebli ◽

Asmae El Kassiri ◽

Fatima-Zahra Belouadha

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Health Management ◽

Negative Impact ◽

Decision Makers ◽

Machine Learning Techniques ◽

Time Lags ◽

Context Sensitive ◽

Learning Techniques

Coronavirus 2019 (COVID-19) has reached the stage of an international epidemic with a major socioeconomic negative impact. Considering the weakness of the healthy structure and the limited availability of test kits, particularly in emerging countries, predicting the spread of COVID-19 is expected to help decision-makers to improve health management and contribute to alleviating the related risks. In this article, we studied the effectiveness of machine learning techniques using Morocco as a case-study. We studied the performance of six multi-step models derived from both Machine Learning and Deep Learning regards multiple scenarios by combining different time lags and three COVID-19 datasets(periods): confinement, deconfinement, and hybrid datasets. The results prove the efficiency of Deep Learning models and identify the best combinations of these models and the time lags enabling good predictions of new cases. The results also show that the prediction of the spread of COVID-19 is a context sensitive problem.

Download Full-text

Data augmentation using a variational autoencoder for estimating property prices

Property Management ◽

10.1108/pm-09-2020-0057 ◽

2021 ◽

Vol 39 (3) ◽

pp. 408-418 ◽

Cited By ~ 1

Author(s):

Changro Lee

Keyword(s):

Deep Learning ◽

South Korea ◽

Data Augmentation ◽

Model Performance ◽

Content Type ◽

Original Dataset ◽

Learning Techniques ◽

Variational Autoencoder ◽

Price Estimation

PurposePrior studies on the application of deep-learning techniques have focused on enhancing computation algorithms. However, the amount of data is also a key element when attempting to achieve a goal using a quantitative approach, which is often underestimated in practice. The problem of sparse sales data is well known in the valuation of commercial properties. This study aims to expand the limited data available to exploit the capability inherent in deep learning techniques.Design/methodology/approachThe deep learning approach is used. Seoul, the capital of South Korea is selected as a case study area. Second, data augmentation is performed for properties with low trade volume in the market using a variational autoencoder (VAE), which is a generative deep learning technique. Third, the generated samples are added into the original dataset of commercial properties to alleviate data insufficiency. Finally, the accuracy of the price estimation is analyzed for the original and augmented datasets to assess the model performance.FindingsThe results using the sales datasets of commercial properties in Seoul, South Korea as a case study show that the augmented dataset by a VAE consistently shows higher accuracy of price estimation for all 30 trials, and the capabilities inherent in deep learning techniques can be fully exploited, promoting the rapid adoption of artificial intelligence skills in the real estate industry.Originality/valueAlthough deep learning-based algorithms are gaining popularity, they are likely to show limited performance when data are insufficient. This study suggests an alternative approach to overcome the lack of data problem in property valuation.

Download Full-text

Using Deep Learning to Predict Sentiments: Case Study in Tourism

Complexity ◽

10.1155/2018/7408431 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

C. A. Martín ◽

J. M. Torres ◽

R. M. Aguilar ◽

S. Diaz

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Short Term Memory ◽

Tourism Industry ◽

Digital Platforms ◽

Mass Tourism ◽

Link Type ◽

Learning Techniques ◽

The Relationship

Technology and the Internet have changed how travel is booked, the relationship between travelers and the tourism industry, and how tourists share their travel experiences. As a result of this multiplicity of options, mass tourism markets have been dispersing. But the global demand has not fallen; quite the contrary, it has increased. Another important factor, the digital transformation, is taking hold to reach new client profiles, especially the so-called third generation of tourism consumers, digital natives who only understand the world through their online presence and who make the most of every one of its advantages. In this context, the digital platforms where users publish their impressions of tourism experiences are starting to carry more weight than the corporate content created by companies and brands. In this paper, we propose using different deep-learning techniques and architectures to solve the problem of classifying the comments that tourists publish online and that new tourists use to decide how best to plan their trip. Specifically, in this paper, we propose a classifier to determine the sentiments reflected on the http://booking.com and http://tripadvisor.com platforms for the service received in hotels. We develop and compare various classifiers based on convolutional neural networks (CNN) and long short-term memory networks (LSTM). These classifiers were trained and validated with data from hotels located on the island of Tenerife. An analysis of our findings shows that the most accurate and robust estimators are those based on LSTM recurrent neural networks.

Download Full-text