Show newer

Towards Definition of Higher Order Causality in Complex Systems arxiv.org/abs/2409.08295 .data-an .ML .IT .IT .LG

Towards Definition of Higher Order Causality in Complex Systems

The description of the dynamics of complex systems, in particular the capture of the interaction structure and causal relationships between elements of the system, is one of the central questions of interdisciplinary research. While the characterization of pairwise causal interactions is a relatively ripe field with established theoretical concepts and the current focus is on technical issues of their efficient estimation, it turns out that the standard concepts such as Granger causality or transfer entropy may not faithfully reflect possible synergies or interactions of higher orders, phenomena highly relevant for many real-world complex systems. In this paper, we propose a generalization and refinement of the information-theoretic approach to causal inference, enabling the description of truly multivariate, rather than multiple pairwise, causal interactions, and moving thus from causal networks to causal hypernetworks. In particular, while keeping the ability to control for mediating variables or common causes, in case of purely synergetic interactions such as the exclusive disjunction, it ascribes the causal role to the multivariate causal set but \emph{not} to individual inputs, distinguishing it thus from the case of e.g. two additive univariate causes. We demonstrate this concept by application to illustrative theoretical examples as well as a biophysically realistic simulation of biological neuronal dynamics recently reported to employ synergetic computations.

arxiv.org

Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models arxiv.org/abs/2409.08387 .ST .IT .ML .TH .IT

Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models

The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate the NML code length is to use the sum (for a discrete model) or integral (for a continuous model) of a function defined by the distribution of the maximum likelihood estimator. While this method has been proven to correctly calculate the NML code length of discrete models, no proof has been provided for continuous cases. Consequently, it has remained unclear whether the method can accurately calculate the NML code length of continuous models. In this paper, we solve this problem affirmatively, proving that the method is also correct for continuous cases. Remarkably, completing the proof for continuous cases is non-trivial in that it cannot be achieved by merely replacing the sums in discrete cases with integrals, as the decomposition trick applied to sums in the discrete model case proof is not applicable to integrals in the continuous model case proof. To overcome this, we introduce a novel decomposition approach based on the coarea formula from geometric measure theory, which is essential to establishing our proof for continuous cases.

arxiv.org

Think Twice Before You Act: Improving Inverse Problem Solving With MCMC arxiv.org/abs/2409.08551 .ML .LG

Think Twice Before You Act: Improving Inverse Problem Solving With MCMC

Recent studies demonstrate that diffusion models can serve as a strong prior for solving inverse problems. A prominent example is Diffusion Posterior Sampling (DPS), which approximates the posterior distribution of data given the measure using Tweedie's formula. Despite the merits of being versatile in solving various inverse problems without re-training, the performance of DPS is hindered by the fact that this posterior approximation can be inaccurate especially for high noise levels. Therefore, we propose \textbf{D}iffusion \textbf{P}osterior \textbf{MC}MC (\textbf{DPMC}), a novel inference algorithm based on Annealed MCMC to solve inverse problems with pretrained diffusion models. We define a series of intermediate distributions inspired by the approximated conditional distributions used by DPS. Through annealed MCMC sampling, we encourage the samples to follow each intermediate distribution more closely before moving to the next distribution at a lower noise level, and therefore reduce the accumulated error along the path. We test our algorithm in various inverse problems, including super resolution, Gaussian deblurring, motion deblurring, inpainting, and phase retrieval. Our algorithm outperforms DPS with less number of evaluations across nearly all tasks, and is competitive among existing approaches.

arxiv.org

Gaussian Process Upper Confidence Bounds in Distributed Point Target Tracking over Wireless Sensor Networks arxiv.org/abs/2409.07652 .ML .ST .TH .LG

Gaussian Process Upper Confidence Bounds in Distributed Point Target Tracking over Wireless Sensor Networks

Uncertainty quantification plays a key role in the development of autonomous systems, decision-making, and tracking over wireless sensor networks (WSNs). However, there is a need of providing uncertainty confidence bounds, especially for distributed machine learning-based tracking, dealing with different volumes of data collected by sensors. This paper aims to fill in this gap and proposes a distributed Gaussian process (DGP) approach for point target tracking and derives upper confidence bounds (UCBs) of the state estimates. A unique contribution of this paper includes the derived theoretical guarantees on the proposed approach and its maximum accuracy for tracking with and without clutter measurements. Particularly, the developed approaches with uncertainty bounds are generic and can provide trustworthy solutions with an increased level of reliability. A novel hybrid Bayesian filtering method is proposed to enhance the DGP approach by adopting a Poisson measurement likelihood model. The proposed approaches are validated over a WSN case study, where sensors have limited sensing ranges. Numerical results demonstrate the tracking accuracy and robustness of the proposed approaches. The derived UCBs constitute a tool for trustworthiness evaluation of DGP approaches. The simulation results reveal that the proposed UCBs successfully encompass the true target states with 88% and 42% higher probability in X and Y coordinates, respectively, when compared to the confidence interval-based method.

arxiv.org

Unsupervised anomaly detection in spatio-temporal stream network sensor data arxiv.org/abs/2409.07667 .AP

Unsupervised anomaly detection in spatio-temporal stream network sensor data

The use of in-situ digital sensors for water quality monitoring is becoming increasingly common worldwide. While these sensors provide near real-time data for science, the data are prone to technical anomalies that can undermine the trustworthiness of the data and the accuracy of statistical inferences, particularly in spatial and temporal analyses. Here we propose a framework for detecting anomalies in sensor data recorded in stream networks, which takes advantage of spatial and temporal autocorrelation to improve detection rates. The proposed framework involves the implementation of effective data imputation to handle missing data, alignment of time-series to address temporal disparities, and the identification of water quality events. We explore the effectiveness of a suite of state-of-the-art statistical methods including posterior predictive distributions, finite mixtures, and Hidden Markov Models (HMM). We showcase the practical implementation of automated anomaly detection in near-real time by employing a Bayesian recursive approach. This demonstration is conducted through a comprehensive simulation study and a practical application to a substantive case study situated in the Herbert River, located in Queensland, Australia, which flows into the Great Barrier Reef. We found that methods such as posterior predictive distributions and HMM produce the best performance in detecting multiple types of anomalies. Utilizing data from multiple sensors deployed relatively near one another enhances the ability to distinguish between water quality events and technical anomalies, thereby significantly improving the accuracy of anomaly detection. Thus, uncertainty and biases in water quality reporting, interpretation, and modelling are reduced, and the effectiveness of subsequent management actions improved.

arxiv.org

Ratio Divergence Learning Using Target Energy in Restricted Boltzmann Machines: Beyond Kullback--Leibler Divergence Learning arxiv.org/abs/2409.07679 -mat.dis-nn .ML .ST .ME .TH .LG

Ratio Divergence Learning Using Target Energy in Restricted Boltzmann Machines: Beyond Kullback--Leibler Divergence Learning

We propose ratio divergence (RD) learning for discrete energy-based models, a method that utilizes both training data and a tractable target energy function. We apply RD learning to restricted Boltzmann machines (RBMs), which are a minimal model that satisfies the universal approximation theorem for discrete distributions. RD learning combines the strength of both forward and reverse Kullback-Leibler divergence (KLD) learning, effectively addressing the "notorious" issues of underfitting with the forward KLD and mode-collapse with the reverse KLD. Since the summation of forward and reverse KLD seems to be sufficient to combine the strength of both approaches, we include this learning method as a direct baseline in numerical experiments to evaluate its effectiveness. Numerical experiments demonstrate that RD learning significantly outperforms other learning methods in terms of energy function fitting, mode-covering, and learning stability across various discrete energy-based models. Moreover, the performance gaps between RD learning and the other learning methods become more pronounced as the dimensions of target models increase.

arxiv.org

Dataset-Free Weight-Initialization on Restricted Boltzmann Machine arxiv.org/abs/2409.07708 -mat.dis-nn .ML .LG

Dataset-Free Weight-Initialization on Restricted Boltzmann Machine

In feed-forward neural networks, dataset-free weight-initialization method such as LeCun, Xavier (or Glorot), and He initializations have been developed. These methods randomly determine the initial values of weight parameters based on specific distributions (e.g., Gaussian or uniform distributions) without using training datasets. To the best of the authors' knowledge, such a dataset-free weight-initialization method is yet to be developed for restricted Boltzmann machines (RBMs), which are probabilistic neural networks consisting of two layers, In this study, we derive a dataset-free weight-initialization method for Bernoulli--Bernoulli RBMs based on a statistical mechanical analysis. In the proposed weight-initialization method, the weight parameters are drawn from a Gaussian distribution with zero mean. The standard deviation of the Gaussian distribution is optimized based on our hypothesis which is that a standard deviation providing a larger layer correlation (LC) between the two layers improves the learning efficiency. The expression of the LC is derived based on a statistical mechanical analysis. The optimal value of the standard deviation corresponds to the maximum point of the LC. The proposed weight-initialization method is identical to Xavier initialization in a specific case (i.e., in the case the sizes of the two layers are the same, the random variables of the layers are $\{-1,1\}$-binary, and all bias parameters are zero).

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.