Show newer

Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles arxiv.org/abs/2408.11084

Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce the complexity in the nonconvex regime. Our results imply that a series of stochastic optimization problems with biased oracles, previously considered to be more challenging, is fundamentally no harder than the classical stochastic optimization with unbiased oracles. We also delineate the boundary conditions under which these problems become more difficult. Moreover, MLMC gradient methods significantly improve the best-known complexities in the literature for conditional stochastic optimization and shortfall risk optimization. Our extensive numerical experiments on distributionally robust optimization, pricing and staffing scheduling problems, and contrastive learning demonstrate the superior performance of MLMC gradient methods.

arxiv.org

A Memory Reduction Compact Gas Kinetic Scheme on 3D Unstructured Meshes arxiv.org/abs/2408.10214

A Memory Reduction Compact Gas Kinetic Scheme on 3D Unstructured Meshes

This paper introduces a memory-reduction third-order compact gas-kinetic scheme (CGKS) for solving compressible Euler and Navier-Stokes equations on 3D unstructured meshes. The scheme utilizes a time-evolution gas distribution function to provide a time-evolution solution at cell interfaces, enabling the implementation of Hermite WENO techniques for high-order reconstruction. However, the HWENO method needs to store a coefficients matrix for the quadratic polynomial to achieve third-order accuracy, resulting in high memory usage. A novel reconstruction method, built upon HWENO reconstruction, has been designed to enhance computational efficiency and reduce memory usage compared to the original CGKS. The simple idea is that the first-order and second-order terms of the quadratic polynomials are determined in a two-step way. In the first step, the second-order terms are obtained from the reconstruction of a linear polynomial of the first-order derivatives by only using the cell-averaged slopes, since the second-order derivatives are nothing but the "derivatives of derivatives". Subsequently, the first-order terms left can be determined by the linear reconstruction only using cell-averaged values. Thus, we successfully split one quadratic least-square regression into several linear least-square regressions, which are commonly used in a second-order finite volume code. Since only a small matrix inversion is needed in a 3-D linear least-square regression, the computational cost for the new reconstruction is dramatically reduced and the storage of the reconstruction-coefficient matrix is no longer necessary. The proposed new reconstruction technique can reduce the overall computational cost by about 20 to 30 percent. The challenging large-scale unsteady numerical simulation is performed, which demonstrates that the current improvement brings the CGKS to a new level for industrial applications.

arxiv.org

Inference of Heterogeneous Material Properties via Infinite-Dimensional Integrated DIC arxiv.org/abs/2408.10217

Inference of Heterogeneous Material Properties via Infinite-Dimensional Integrated DIC

We present a scalable and efficient framework for the inference of spatially-varying parameters of continuum materials from image observations of their deformations. Our goal is the nondestructive identification of arbitrary damage, defects, anomalies and inclusions without knowledge of their morphology or strength. Since these effects cannot be directly observed, we pose their identification as an inverse problem. Our approach builds on integrated digital image correlation (IDIC, Besnard Hild, Roux, 2006), which poses the image registration and material inference as a monolithic inverse problem, thereby enforcing physical consistency of the image registration using the governing PDE. Existing work on IDIC has focused on low-dimensional parameterizations of materials. In order to accommodate the inference of heterogeneous material propertes that are formally infinite dimensional, we present $\infty$-IDIC, a general formulation of the PDE-constrained coupled image registration and inversion posed directly in the function space setting. This leads to several mathematical and algorithmic challenges arising from the ill-posedness and high dimensionality of the inverse problem. To address ill-posedness, we consider various regularization schemes, namely $H^1$ and total variation for the inference of smooth and sharp features, respectively. To address the computational costs associated with the discretized problem, we use an efficient inexact-Newton CG framework for solving the regularized inverse problem. In numerical experiments, we demonstrate the ability of $\infty$-IDIC to characterize complex, spatially varying Lamé parameter fields of linear elastic and hyperelastic materials. Our method exhibits (i) the ability to recover fine-scale and sharp material features, (ii) mesh-independent convergence performance and hyperparameter selection, (iii) robustness to observational noise.

arxiv.org

Constructive and consistent estimation of quadratic minimax arxiv.org/abs/2408.10218

Constructive and consistent estimation of quadratic minimax

We consider $k$ square integrable random variables $Y_1,...,Y_k$ and $k$ random (row) vectors of length $p$, $X_1,...,X_k$ such that $X_i(l)$ is square integrable for $1\le i\le k$ and $1\le l\le p$. No assumptions whatsoever are made of any relationship between the $X_i$:s and $Y_i$:s. We shall refer to each pairing of $X_i$ and $Y_i$ as an environment. We form the square risk functions $R_i(β)=\mathbb{E}\left[(Y_i-βX_i)^2\right]$ for every environment and consider $m$ affine combinations of these $k$ risk functions. Next, we define a parameter space $Θ$ where we associate each point with a subset of the unique elements of the covariance matrix of $(X_i,Y_i)$ for an environment. Then we study estimation of the $\arg\min$-solution set of the maximum of a the $m$ affine combinations the of quadratic risk functions. We provide a constructive method for estimating the entire $\arg\min$-solution set which is consistent almost surely outside a zero set in $Θ^k$. This method is computationally expensive, since it involves solving polynomials of general degree. To overcome this, we define another approximate estimator that also provides a consistent estimation of the solution set based on the bisection method, which is computationally much more efficient. We apply the method to worst risk minimization in the setting of structural equation models.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.