arXiv Statistics @arxiv_stats@qoto.org

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019

2 Following 606 Followers

Posts Posts and replies Media

arXiv Statistics @arxiv_stats@qoto.org

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning. (arXiv:2211.11802v1 [cs.LG]) http://arxiv.org/abs/2211.11802

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the potential to transfer the successes of reinforcement learning (RL) to domains where data collection is acutely problematic. In this offline setting, a key challenge is overcoming overestimation bias for actions not present in data which, without the ability to correct for via interaction with the environment, can propagate and compound during training, leading to highly sub-optimal policies. One simple method to reduce this bias is to introduce a policy constraint via behavioural cloning (BC), which encourages agents to pick actions closer to the source data. By finding the right balance between RL and BC such approaches have been shown to be surprisingly effective while requiring minimal changes to the underlying algorithms they are based on. To date this balance has been held constant, but in this work we explore the idea of tipping this balance towards RL following initial training. Using TD3-BC, we demonstrate that by continuing to train a policy offline while reducing the influence of the BC component we can produce refined policies that outperform the original baseline, as well as match or exceed the performance of more complex alternatives. Furthermore, we demonstrate such an approach can be used for stable online fine-tuning, allowing policies to be safely improved during deployment.

arXiv Statistics @arxiv_stats@qoto.org

When More Is Less: Pitfalls of significance testing. (arXiv:2211.11814v1 [stat.AP]) http://arxiv.org/abs/2211.11814

When More Is Less: Pitfalls of significance testing

The controversy about statistical significance vs. scientific relevance is more than 100 years old. But still nowadays null hypothesis significance testing is considered as gold standard in many empirical fields from economics and social sciences over psychology to medicine, and small $p$-values are often the key to publish in journals of high scientific reputation. I highlight, illustrate and discuss potential pitfalls of statistical significance testing on three occasions.

arXiv Statistics @arxiv_stats@qoto.org

Bayesian Learning for Neural Networks: an algorithmic survey. (arXiv:2211.11865v1 [stat.ML]) http://arxiv.org/abs/2211.11865

Bayesian Learning for Neural Networks: an algorithmic survey

The last decade witnessed a growing interest in Bayesian learning. Yet, the technicality of the topic and the multitude of ingredients involved therein, besides the complexity of turning theory into practical implementations, limit the use of the Bayesian learning paradigm, preventing its widespread adoption across different fields and applications. This self-contained survey engages and introduces readers to the principles and algorithms of Bayesian Learning for Neural Networks. It provides an introduction to the topic from an accessible, practical-algorithmic perspective. Upon providing a general introduction to Bayesian Neural Networks, we discuss and present both standard and recent approaches for Bayesian inference, with an emphasis on solutions relying on Variational Inference and the use of Natural gradients. We also discuss the use of manifold optimization as a state-of-the-art approach to Bayesian learning. We examine the characteristic properties of all the discussed methods, and provide pseudo-codes for their implementation, paying attention to practical aspects, such as the computation of the gradients

arXiv Statistics @arxiv_stats@qoto.org

Structural Modelling of Dynamic Networks and Identifying Maximum Likelihood. (arXiv:2211.11876v1 [econ.EM]) http://arxiv.org/abs/2211.11876

Structural Modelling of Dynamic Networks and Identifying Maximum Likelihood

This paper considers nonlinear dynamic models where the main parameter of interest is a nonnegative matrix characterizing the network (contagion) effects. This network matrix is usually constrained either by assuming a limited number of nonzero elements (sparsity), or by considering a reduced rank approach for nonnegative matrix factorization (NMF). We follow the latter approach and develop a new probabilistic NMF method. We introduce a new Identifying Maximum Likelihood (IML) method for consistent estimation of the identified set of admissible NMF's and derive its asymptotic distribution. Moreover, we propose a maximum likelihood estimator of the parameter matrix for a given non-negative rank, derive its asymptotic distribution and the associated efficiency bound.

arXiv Statistics @arxiv_stats@qoto.org

Parameter Estimation in Nonlinear Multivariate Stochastic Differential Equations Based on Splitting Schemes. (arXiv:2211.11884v1 [stat.ME]) http://arxiv.org/abs/2211.11884

Parameter Estimation in Nonlinear Multivariate Stochastic Differential Equations Based on Splitting Schemes

Surprisingly, general estimators for nonlinear continuous time models based on stochastic differential equations are yet lacking. Most applications still use the Euler-Maruyama discretization, despite many proofs of its bias. More sophisticated methods, such as the Kessler, the Ozaki, or MCMC methods, lack a straightforward implementation and can be numerically unstable. We propose two efficient and easy-to-implement likelihood-based estimators based on the Lie-Trotter (LT) and the Strang (S) splitting schemes. We prove that S also has an $L^p$ convergence rate of order 1, which was already known for LT. We prove under the less restrictive one-sided Lipschitz assumption that the estimators are consistent and asymptotically normal. A numerical study on the 3-dimensional stochastic Lorenz chaotic system complements our theoretical findings. The simulation shows that the S estimator performs the best when measured on both precision and computational speed compared to the state-of-the-art.

arXiv Statistics @arxiv_stats@qoto.org

Precision education: A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data. (arXiv:2211.11888v1 [stat.ME]) http://arxiv.org/abs/2211.11888

Precision education: A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data

We propose a novel nonparametric Bayesian IRT model in this paper by introducing the clustering effect at question level and further assume heterogeneity at examinee level under each question cluster, characterized by the mixture of Binomial distributions. The main contribution of this work is threefold: (1) We demonstrate that the model is identifiable. (2) The clustering effect can be captured asymptotically and the parameters of interest that measure the proficiency of examinees in solving certain questions can be estimated at a root n rate (up to a log term). (3) We present a tractable sampling algorithm to obtain valid posterior samples from our proposed model. We evaluate our model via a series of simulations as well as apply it to an English assessment data. This data analysis example nicely illustrates how our model can be used by test makers to distinguish different types of students and aid in the design of future tests.

arXiv Statistics @arxiv_stats@qoto.org

A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis. (arXiv:2211.11891v1 [stat.ML]) http://arxiv.org/abs/2211.11891

A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

Much like the classical Fisher linear discriminant analysis, Wasserstein discriminant analysis (WDA) is a supervised linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of different data classes and minimize the dispersion of same data classes. However, in contrast, WDA can account for both global and local inter-connections between data classes using a regularized Wasserstein distance. WDA is formulated as a bi-level nonlinear trace ratio optimization. In this paper, we present a bi-level nonlinear eigenvector (NEPv) algorithm, called WDA-nepv. The inner kernel of WDA-nepv for computing the optimal transport matrix of the regularized Wasserstein distance is formulated as an NEPv, and meanwhile the outer kernel for the trace ratio optimization is also formulated as another NEPv. Consequently, both kernels can be computed efficiently via self-consistent-field iterations and modern solvers for linear eigenvalue problems. Comparing with the existing algorithms for WDA, WDA-nepv is derivative-free and surrogate-model-free. The computational efficiency and applications in classification accuracy of WDA-nepv are demonstrated using synthetic and real-life datasets.

arXiv Statistics @arxiv_stats@qoto.org

Equality of Effort via Algorithmic Recourse. (arXiv:2211.11892v1 [stat.ML]) http://arxiv.org/abs/2211.11892

Equality of Effort via Algorithmic Recourse

This paper proposes a method for measuring fairness through equality of effort by applying algorithmic recourse through minimal interventions. Equality of effort is a property that can be quantified at both the individual and the group level. It answers the counterfactual question: what is the minimal cost for a protected individual or the average minimal cost for a protected group of individuals to reverse the outcome computed by an automated system? Algorithmic recourse increases the flexibility and applicability of the notion of equal effort: it overcomes its previous limitations by reconciling multiple treatment variables, introducing feasibility and plausibility constraints, and integrating the actual relative costs of interventions. We extend the existing definition of equality of effort and present an algorithm for its assessment via algorithmic recourse. We validate our approach both on synthetic data and on the German credit dataset.

arXiv Statistics @arxiv_stats@qoto.org

Margin-closed vector autoregressive time series models. (arXiv:2211.11898v1 [stat.ME]) http://arxiv.org/abs/2211.11898

Margin-closed vector autoregressive time series models

Conditions are obtained for a Gaussian vector autoregressive time series of order $k$, VAR($k$), to have univariate margins that are autoregressive of order $k$ or lower-dimensional margins that are also VAR($k$). This can lead to $d$-dimensional VAR($k$) models that are closed with respect to a given partition $\{S_1,\ldots,S_n\}$ of $\{1,\ldots,d\}$ by specifying marginal serial dependence and some cross-sectional dependence parameters. The special closure property allows one to fit the sub-processes of multivariate time series before assembling them by fitting the dependence structure between the sub-processes. We revisit the use of the Gaussian copula of the stationary joint distribution of observations in the VAR($k$) process with non-Gaussian univariate margins but under the constraint of closure under margins. This construction allows more flexibility in handling higher-dimensional time series and a multi-stage estimation procedure can be used. The proposed class of models is applied to a macro-economic data set and compared with the relevant benchmark models.

arXiv Statistics @arxiv_stats@qoto.org

EM's Convergence in Gaussian Latent Tree Models. (arXiv:2211.11904v1 [cs.LG]) http://arxiv.org/abs/2211.11904

EM's Convergence in Gaussian Latent Tree Models

We study the optimization landscape of the log-likelihood function and the convergence of the Expectation-Maximization (EM) algorithm in latent Gaussian tree models, i.e.~tree-structured Gaussian graphical models whose leaf nodes are observable and non-leaf nodes are unobservable. We show that the unique non-trivial stationary point of the population log-likelihood is its global maximum, and establish that the expectation-maximization algorithm is guaranteed to converge to it in the single latent variable case. Our results for the landscape of the log-likelihood function in general latent tree models provide support for the extensive practical use of maximum likelihood based-methods in this setting. Our results for the EM algorithm extend an emerging line of work on obtaining global convergence guarantees for this celebrated algorithm. We show our results for the non-trivial stationary points of the log-likelihood by arguing that a certain system of polynomial equations obtained from the EM updates has a unique non-trivial solution. The global convergence of the EM algorithm follows by arguing that all trivial fixed points are higher-order saddle points.

arXiv Statistics @arxiv_stats@qoto.org

Scalable Bayesian Inference for Finding Strong Gravitational Lenses. (arXiv:2211.10479v1 [astro-ph.IM]) http://arxiv.org/abs/2211.10479

Scalable Bayesian Inference for Finding Strong Gravitational Lenses

Finding strong gravitational lenses in astronomical images allows us to assess cosmological theories and understand the large-scale structure of the universe. Previous works on lens detection do not quantify uncertainties in lens parameter estimates or scale to modern surveys. We present a fully amortized Bayesian procedure for lens detection that overcomes these limitations. Unlike traditional variational inference, in which training minimizes the reverse Kullback-Leibler (KL) divergence, our method is trained with an expected forward KL divergence. Using synthetic GalSim images and real Sloan Digital Sky Survey (SDSS) images, we demonstrate that amortized inference trained with the forward KL produces well-calibrated uncertainties in both lens detection and parameter estimation.

arXiv Statistics @arxiv_stats@qoto.org

Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics. (arXiv:2211.10508v1 [stat.ML]) http://arxiv.org/abs/2211.10508

Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics

We propose a general approach for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum probability). This approach uses a training loss function that does not know any demographic information to treat as sensitive. Despite this, we demonstrate that our proposed approach often scores better on recently established fairness metrics (without a significant drop in prediction accuracy) compared to various baselines, including ones which directly use sensitive demographic information in their training loss. Our code is available at: https://github.com/discovershu/DRO_COX

arXiv Statistics @arxiv_stats@qoto.org

Curiosity in hindsight. (arXiv:2211.10515v1 [stat.ML]) http://arxiv.org/abs/2211.10515

Curiosity in hindsight

Consider the exploration in sparse-reward or reward-free environments, such as Montezuma's Revenge. The curiosity-driven paradigm dictates an intuitive technique: At each step, the agent is rewarded for how much the realized outcome differs from their predicted outcome. However, using predictive error as intrinsic motivation is prone to fail in stochastic environments, as the agent may become hopelessly drawn to high-entropy areas of the state-action space, such as a noisy TV. Therefore it is important to distinguish between aspects of world dynamics that are inherently predictable and aspects that are inherently unpredictable: The former should constitute a source of intrinsic reward, whereas the latter should not. In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- not any more, not any less -- which we use as additional input for predictions, such that intrinsic rewards do vanish in the limit. First, we propose incorporating such hindsight representations into the agent's model to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to all types of stochasticity. Second, we implement this framework as a drop-in modification of any prediction-based exploration bonus, and instantiate it for the recently introduced BYOL-Explore algorithm as a prime example, resulting in the noise-robust "BYOL-Hindsight". Third, we illustrate its behavior under various stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Importantly, we show SOTA results in exploring Montezuma with sticky actions, while preserving performance in the non-sticky setting.

arXiv Statistics @arxiv_stats@qoto.org

Phase transition and higher order analysis of $L_q$ regularization under dependence. (arXiv:2211.10541v1 [math.ST]) http://arxiv.org/abs/2211.10541

Phase transition and higher order analysis of $L_q$ regularization under dependence

We study the problem of estimating a $k$-sparse signal $\bbeta_0\in\bR^p$ from a set of noisy observations $\by\in\bR^n$ under the model $\by=\bX\bbeta+w$, where $\bX\in\bR^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,\bSigma)$. We consider the class of $L_q$-regularized least squares (LQLS) given by the formulation $\hat{\bbeta}(λ,q)=\text{argmin}_{\bbeta\in\bR^p}\frac{1}{2}\|\by-\bX\bbeta\|^2_2+λ\|\bbeta\|_q^q$, where $\|\cdot\|_q$ $(0\le q\le 2)$ denotes the $L_q$-norm. In the setting $p,n,k\rightarrow\infty$ with fixed $k/p=ε$ and $n/p=δ$, we derive the asymptotic risk of $\hat{\bbeta}(λ,q)$ for arbitrary covariance matrix $\bSigma$ which generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim}N(0,1)$. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine its phase transition behavior. Our results show that the first dominant term does not depend on the covariance structure of $\bSigma$ for the cases $0\le q\textless 1$ and $1\textless q\le 2$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ also known as LASSO. To study the influence of the covariance structure of $\bSigma$ on the performance of LQLS in the cases $0\le q\textless 1$ and $1\textless q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of the small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.

arXiv Statistics @arxiv_stats@qoto.org

Leaf clustering using circular densities. (arXiv:2211.10547v1 [stat.AP]) http://arxiv.org/abs/2211.10547

Leaf clustering using circular densities

In the biology field of botany, leaf shape recognition is an important task. One way of characterising the leaf shape is through the centroid contour distances (CCD). Each CCD path might have different resolution, so normalisation is done by considering that they are circular densities. Densities are rotated by subtracting the mean preferred direction. Distance measures between densities are used to produce a hierarchical clustering method to classify the leaves. We illustrate our approach with a real dataset.

arXiv Statistics @arxiv_stats@qoto.org

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems. (arXiv:2211.10656v1 [cs.CV]) http://arxiv.org/abs/2211.10656

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

Diffusion model-based inverse problem solvers have demonstrated state-of-the-art performance in cases where the forward operator is known (i.e. non-blind). However, the applicability of the method to blind inverse problems has yet to be explored. In this work, we show that we can indeed solve a family of blind inverse problems by constructing another diffusion prior for the forward operator. Specifically, parallel reverse diffusion guided by gradients from the intermediate stages enables joint optimization of both the forward operator parameters as well as the image, such that both are jointly estimated at the end of the parallel reverse diffusion procedure. We show the efficacy of our method on two representative tasks -- blind deblurring, and imaging through turbulence -- and show that our method yields state-of-the-art performance, while also being flexible to be applicable to general blind inverse problems when we know the functional forms.

arXiv Statistics @arxiv_stats@qoto.org

Spatial regression-based transfer learning for prediction problems. (arXiv:2211.10693v1 [stat.AP]) http://arxiv.org/abs/2211.10693

Spatial regression-based transfer learning for prediction problems

Although spatial prediction is widely used for urban and environmental monitoring, its accuracy is often unsatisfactory if only a small number of samples are available in the study area. The objective of this study was to improve the prediction accuracy in such a case through transfer learning using larger samples obtained outside the study area. Our proposal is to pre-train latent spatial-dependent processes, which are difficult to transfer, and apply them as additional features in the subsequent transfer learning. The proposed method is designed to involve local spatial dependence and can be implemented easily. This spatial-regression-based transfer learning is expected to achieve a higher and more stable prediction accuracy than conventional learning, which does not explicitly consider local spatial dependence. The performance of the proposed method was examined using land price and crime predictions. These results suggest that the proposed method successfully improved the accuracy and stability of these spatial predictions.

arXiv Statistics @arxiv_stats@qoto.org

Moment-SoS Methods for Optimal Transport Problems. (arXiv:2211.10742v1 [math.NA]) http://arxiv.org/abs/2211.10742

Moment-SoS Methods for Optimal Transport Problems

Most common Optimal Transport (OT) solvers are currently based on an approximation of underlying measures by discrete measures. However, it is sometimes relevant to work only with moments of measures instead of the measure itself, and many common OT problems can be formulated as moment problems (the most relevant examples being $L^p$-Wasserstein distances, barycenters, and Gromov-Wasserstein discrepancies on Euclidean spaces). We leverage this fact to develop a generalized moment formulation that covers these classes of OT problems. The transport plan is represented through its moments on a given basis, and the marginal constraints are expressed in terms of moment constraints. A practical computation then consists in considering a truncation of the involved moment sequences up to a certain order, and using the polynomial sums-of-squares hierarchy for measures supported on semi-algebraic sets. We prove that the strategy converges to the solution of the OT problem as the order increases. We also show how to approximate linear quantities of interest, and how to estimate the support of the optimal transport map from the computed moments using Christoffel-Darboux kernels. Numerical experiments illustrate the good behavior of the approach.

arXiv Statistics @arxiv_stats@qoto.org

Towards good validation metrics for generative models in offline model-based optimisation. (arXiv:2211.10747v1 [stat.ML]) http://arxiv.org/abs/2211.10747

Towards good validation metrics for generative models in offline model-based optimisation

In this work we propose a principled evaluation framework for model-based optimisation to measure how well a generative model can extrapolate. We achieve this by interpreting the training and validation splits as draws from their respective `truncated' ground truth distributions, where examples in the validation set contain scores much larger than those in the training set. Model selection is performed on the validation set for some prescribed validation metric. A major research question however is in determining what validation metric correlates best with the expected value of generated candidates with respect to the ground truth oracle; work towards answering this question can translate to large economic gains since it is expensive to evaluate the ground truth oracle in the real world. We compare various validation metrics for generative adversarial networks using our framework. We also discuss limitations with our framework with respect to existing datasets and how progress can be made to mitigate them.

arXiv Statistics @arxiv_stats@qoto.org

RL Boltzmann Generators for Conformer Generation in Data-Sparse Environments. (arXiv:2211.10771v1 [q-bio.QM]) http://arxiv.org/abs/2211.10771

RL Boltzmann Generators for Conformer Generation in Data-Sparse Environments

The generation of conformers has been a long-standing interest to structural chemists and biologists alike. A subset of proteins known as intrinsically disordered proteins (IDPs) fail to exhibit a fixed structure and, therefore, must also be studied in this light of conformer generation. Unlike in the small molecule setting, ground truth data are sparse in the IDP setting, undermining many existing conformer generation methods that rely on such data for training. Boltzmann generators, trained solely on the energy function, serve as an alternative but display a mode collapse that similarly preclude their direct application to IDPs. We investigate the potential of training an RL Boltzmann generator against a closely related "Gibbs score," and demonstrate that conformer coverage does not track well with such training. This suggests that the inadequacy of solely training against the energy is independent of the modeling modality

Bot

I post the feed of the arXiv Statistics.

#Statistics #Stats #Mathematics #Math #Maths #Science #arXiv #News #PeerReview

Joined Aug 2019