input data
Recently Published Documents





2022 ◽  
Vol 3 (1) ◽  
pp. 1-26
Omid Hajihassani ◽  
Omid Ardakanian ◽  
Hamzeh Khazaei

The abundance of data collected by sensors in Internet of Things devices and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this article, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.

Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 132
Eyad Alsaghir ◽  
Xiyu Shi ◽  
Varuna De Silva ◽  
Ahmet Kondoz

Deep learning, in general, was built on input data transformation and presentation, model training with parameter tuning, and recognition of new observations using the trained model. However, this came with a high computation cost due to the extensive input database and the length of time required in training. Despite the model learning its parameters from the transformed input data, no direct research has been conducted to investigate the mathematical relationship between the transformed information (i.e., features, excitation) and the model’s learnt parameters (i.e., weights). This research aims to explore a mathematical relationship between the input excitations and the weights of a trained convolutional neural network. The objective is to investigate three aspects of this assumed feature-weight relationship: (1) the mathematical relationship between the training input images’ features and the model’s learnt parameters, (2) the mathematical relationship between the images’ features of a separate test dataset and a trained model’s learnt parameters, and (3) the mathematical relationship between the difference of training and testing images’ features and the model’s learnt parameters with a separate test dataset. The paper empirically demonstrated the existence of this mathematical relationship between the test image features and the model’s learnt weights by the ANOVA analysis.

2022 ◽  
pp. 1-39
Zhicheng Geng ◽  
Zhanxuan Hu ◽  
Xinming Wu ◽  
Luming Liang ◽  
Sergey Fomel

Detecting subsurface salt structures from seismic images is important for seismic structural analysis and subsurface modeling. Recently, deep learning has been successfully applied in solving salt segmentation problems. However, most of the studies focus on supervised salt segmentation and require numerous accurately labeled data, which is usually laborious and time-consuming to collect, especially for the geophysics community. In this paper, we propose a semi-supervised framework for salt segmentation, which requires only a small amount of labeled data. In our method, adopting the mean teacher method, we train two models sharing the same network architecture. The student model is optimized using a combination of supervised loss and unsupervised consistency loss, whereas the teacher model is the exponential moving average (EMA) of the student model. We introduce the unsupervised consistency loss to better extract information from unlabeled data by constraining the network to give consistent predictions for the input data and its perturbed version. We train and test our novel semi-supervised method on both synthetic and real datasets. Results demonstrate that our proposed semi-supervised salt segmentation method outperforms the supervised baseline when there is a lack of labeled training data.

2022 ◽  
pp. 1-90
David Lubo-Robles ◽  
Deepak Devegowda ◽  
Vikram Jayaram ◽  
Heather Bedle ◽  
Kurt J. Marfurt ◽  

During the past two decades, geoscientists have used machine learning to produce a more quantitative reservoir characterization and to discover hidden patterns in their data. However, as the complexity of these models increase, the sensitivity of their results to the choice of the input data becomes more challenging. Measuring how the model uses the input data to perform either a classification or regression task provides an understanding of the data-to-geology relationships which indicates how confident we are in the prediction. To provide such insight, the ML community has developed Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP) tools. In this study, we train a random forest architecture using a suite of seismic attributes as input to differentiate between mass transport deposits (MTDs), salt, and conformal siliciclastic sediments in a Gulf of Mexico dataset. We apply SHAP to understand how the model uses the input seismic attributes to identify target seismic facies and examine in what manner variations in the input such as adding band-limited random noise or applying a Kuwahara filter impact the models’ predictions. During our global analysis, we find that the attribute importance is dynamic, and changes based on the quality of the seismic attributes and the seismic facies analyzed. For our data volume and target facies, attributes measuring changes in dip and energy show the largest importance for all cases in our sensitivity analysis. We note that to discriminate between the seismic facies, the ML architecture learns a “set of rules” in multi-attribute space and that overlap between MTDs, salt, and conformal sediments might exist based on the seismic attribute analyzed. Finally, using SHAP at a voxel-scale, we understand why certain areas of interest were misclassified by the algorithm and perform an in-context interpretation to analyze how changes in the geology impact the model’s predictions.

Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 158
Ain Cheon ◽  
Jwakyung Sung ◽  
Hangbae Jun ◽  
Heewon Jang ◽  
Minji Kim ◽  

The application of a machine learning (ML) model to bio-electrochemical anaerobic digestion (BEAD) is a future-oriented approach for improving process stability by predicting performances that have nonlinear relationships with various operational parameters. Five ML models, which included tree-, regression-, and neural network-based algorithms, were applied to predict the methane yield in BEAD reactor. The results showed that various 1-step ahead ML models, which utilized prior data of BEAD performances, could enhance prediction accuracy. In addition, 1-step ahead with retraining algorithm could improve prediction accuracy by 37.3% compared with the conventional multi-step ahead algorithm. The improvement was particularly noteworthy in tree- and regression-based ML models. Moreover, 1-step ahead with retraining algorithm showed high potential of achieving efficient prediction using pH as a single input data, which is plausibly an easier monitoring parameter compared with the other parameters required in bioprocess models.

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 645
S. Hamed Javadi ◽  
Angela Guerrero ◽  
Abdul M. Mouazen

In precision agriculture (PA) practices, the accurate delineation of management zones (MZs), with each zone having similar characteristics, is essential for map-based variable rate application of farming inputs. However, there is no consensus on an optimal clustering algorithm and the input data format. In this paper, we evaluated the performances of five clustering algorithms including k-means, fuzzy C-means (FCM), hierarchical, mean shift, and density-based spatial clustering of applications with noise (DBSCAN) in different scenarios and assessed the impacts of input data format and feature selection on MZ delineation quality. We used key soil fertility attributes (moisture content (MC), organic carbon (OC), calcium (Ca), cation exchange capacity (CEC), exchangeable potassium (K), magnesium (Mg), sodium (Na), exchangeable phosphorous (P), and pH) collected with an online visible and near-infrared (vis-NIR) spectrometer along with Sentinel2 and yield data of five commercial fields in Belgium. We demonstrated that k-means is the optimal clustering method for MZ delineation, and the input data should be normalized (range normalization). Feature selection was also shown to be positively effective. Furthermore, we proposed an algorithm based on DBSCAN for smoothing the MZs maps to allow smooth actuating during variable rate application by agricultural machinery. Finally, the whole process of MZ delineation was integrated in a clustering and smoothing pipeline (CaSP), which automatically performs the following steps sequentially: (1) range normalization, (2) feature selection based on cross-correlation analysis, (3) k-means clustering, and (4) smoothing. It is recommended to adopt the developed platform for automatic MZ delineation for variable rate applications of farming inputs.

Е.П. Трофимов

Предложен алгоритм последовательной обработки данных на основе блочного псевдообращения матриц полного столбцового ранга. Показывается, что формула блочного псевдообращения, лежащая в основе алгоритма, является обобщением одного шага алгоритма Гревиля псевдообращения в невырожденном случае и потому может быть использована для обобщения метода нахождения весов нейросетевой функции LSHDI (linear solutions to higher dimensional interlayer networks), основанного на алгоритме Гревиля. Представленный алгоритм на каждом этапе использует найденные на предыдущих этапах псевдообратные к блокам матрицы и, следовательно, позволяет сократить вычисления не только за счет работы с матрицами меньшего размера, но и за счет повторного использования уже найденной информации. Приводятся примеры применения алгоритма для восстановления искаженных работой фильтра (шума) одномерных сигналов и двумерных сигналов (изображений). Рассматриваются случаи, когда фильтр является статическим, но на практике встречаются ситуации, когда матрица фильтра меняется с течением времени. Описанный алгоритм позволяет непосредственно в процессе получения входного сигнала перестраивать псевдообратную матрицу с учетом изменения одного или нескольких блоков матрицы фильтра, и потому алгоритм может быть использован и в случае зависящих от времени параметров фильтра (шума). Кроме того, как показывают вычислительные эксперименты, формула блочного псевдообращения, на которой основан описываемый алгоритм, хорошо работает и в случае плохо обусловленных матриц, что часто встречается на практике The paper proposes an algorithm for sequential data processing based on block pseudoinverse of full column rank matrixes. It is shown that the block pseudoinverse formula underlying the algorithm is a generalization of one step of the Greville’s pseudoinverse algorithm in the nonsingular case and can also be used as a generalization for finding weights of neural network function in the LSHDI algorithm (linear solutions to higher dimensional interlayer networks). The presented algorithm uses the pseudoinversed matrixes found at each step, and therefore allows one to reduce the computations not only by working with matrixes of smaller size but also by reusing the already found information. Examples of application of the algorithm for signal and image reconstruction are given. The article deals with cases where noise is static but the algorithm is similarly well suited to dynamically changing noises, allowing one to process input data in blocks on the fly, depending on changes. The block pseudoreverse formula, on which the described algorithm is based, works well in the case of ill-conditioned matrixes, which is often encountered in practice

2022 ◽  
Vol 12 (2) ◽  
pp. 856
Branislav Dimitrijevic ◽  
Sina Darban Khales ◽  
Roksana Asadi ◽  
Joyoung Lee

Highway crashes, along with the property damage, personal injuries, and fatalities that they cause, continue to present one of the most significant and critical transportation problems. At the same time, provision of safe travel is one of the main goals of any transportation system. For this reason, both in transportation research and practice much attention has been given to the analysis and modeling of traffic crashes, including the development of models that can be applied to predict crash occurrence and crash severity. In general, such models assess short-term crash risks at a given highway facility, thus providing intelligence that can be used to identify and implement traffic operations strategies for crash mitigation and prevention. This paper presents several crash risk and injury severity assessment models applied at a highway segment level, considering the input data that is typically collected or readily available to most transportation agencies in real-time and at a regional network scale, which would render them readily applicable in practice. The input data included roadway geometry characteristics, traffic flow characteristics, and weather condition data. The paper develops, tests, and compares the performance of models that employ Random effects Bayesian Logistics Regression, Gaussian Naïve Bayes, K-Nearest Neighbor, Random Forest, and Gradient Boosting Machine methods. The paper applies random oversampling examples (ROSE) method to deal with the problem of data imbalance associated with the injury severity analysis. The models were trained and tested using a dataset of 10,155 crashes that occurred on two interstate highways in New Jersey over a two-year period. The paper also analyzes the potential improvement in the prediction abilities of the tested models by adding reactive data to the analysis. To that end, traffic crashes were classified in multiple classes based on the driver age and the vehicle age to assess the impact of these attributes on driver injury severity outcomes. The results of this analysis are promising, showing that the simultaneous use of reactive and proactive data can improve the prediction performance of the presented models.

2022 ◽  
Vol 14 (2) ◽  
pp. 893
Galina Anatolievna Khmeleva ◽  
Marina Viktorovna Kurnikova ◽  
Erzsébet Nedelka ◽  
Balázs István Tóth

The importance of this research stems from the need to ensure the sustainability of cross-border cooperation through a better understanding of its determinants and causal relationships. While having common features and patterns, cross-border cooperation is always expressed through the relations of specific countries and peoples. Therefore, based upon the PLS-SEM methodology, the authors consider the fundamental factors influencing the external cooperation of Hungary’s transboundary regions. The advantage of the PLS-SEM method is that it enables researchers to simultaneously identify and approximate hidden connections between input data and to construct a regression model describing the relationship between input data. Despite widespread application in economic studies, the authors have not found the use of PLS-SEM for studying cross-border cooperation issues in the current scientific literature. The authors have built a model to assess the hidden factors of cross-border cooperation and to identify the indirect influence of certain factors. The novelty of the research is to identify the determinants of sustainable cross-border cooperation and the relationship between them in a multi-level system of cross-border interaction between businesses, people, and the State. In the Hungarian context, transport infrastructure and business travel are shown to have a direct positive impact on cross-border cooperation. For the first time, tourism and socio-economic conditions have been shown to have powerful but indirect impacts. This work could be the beginning of gathering new evidence on the determinants and causation of cross-border cooperation in the context of other countries. An important finding of the study is the growing importance of indicators of the new, post-industrial economy. As for recommendations, the authors focus on state, regional, and municipal support measures, awareness of the possibilities of cross-border cooperation, the need to develop e-commerce, and alternative energy as a modern basis for converting Hungary’s cross-border position into a competitive advantage.

2022 ◽  
Vol 1 (13) ◽  
pp. 71-79
Hoàng Thái Hổ ◽  
Nguyễn Thế Hùng ◽  
Nguyễn Tuấn Minh

Tóm tắt—Bài báo trình bày một giải pháp sử dụng năng lực của mạng máy tính phân tán cho thám mã khối. Hệ thống có cấu trúc dựa trên 3 phần mềm. Phần mềm quản trị sử dụng cho nhập dữ liệu đầu vào, phân tích và chia khoảng không gian khóa và phân tích kết quả. Phần mềm thám mã trên CPU và GPU được cài đặt tương ứng cho các máy tính trong mạng phân tán có nhiệm vụ thám mã đối với dữ liệu phần mềm quản trị cung cấp. Kết quả được gửi về phần mềm quản trị để phân tích và giải mã. Quá trình thám mã được thực hiện cùng lúc trên toàn bộ máy tính trong mạng vào thời gian máy tính nhàn rỗi, không ảnh hưởng tới hoạt động hàng ngày của người dùng. Hệ thống bao gồm cả các máy tính có sử card GPU giúp tăng hiệu suất thám mã lên gấp 11 lần. Giải pháp đã được ứng dụng trong thám mật khẩu Windows qua mã băm LAN Manager. Abstract—This paper presents a method to use the capabilities of distributed computer networks in cryptanalysis of block ciphers. The system is structured based on 3 software. Management software for input data entry, analysis, and keyspace division. Cryptanalysis software on CPU and GPU is installed respectively for client computers in the distributed network is responsible for cryptanalysis of data provided by the management software. The results are sent to the administrative software for analysis and decoding. The encryption process is performed on all computers in the network at the same time in their spare time, without affecting the user's daily activities. The system includes GPU computers that increase the performance of the cryptanalysis by 11 times. This solution has been applied in Windows password detection via LAN Manager hash code. 

Sign in / Sign up

Export Citation Format

Share Document