Show newer

Three Laws of Statistical Linguistics Emerging in images arxiv.org/abs/2501.18620 .comp-ph .CV

Three Laws of Statistical Linguistics Emerging in images

Images, as a product evolving alongside civilization, develop similarly to natural languages with the advancement of civilization. Not only are images abundant in daily life, but are also influenced by technology in shaping their forms, embodying various characteristics as they evolve in time. Language is a sequence of symbols that represents thoughts. While a written language is typically associated with the close integration of text and sound, as a combination of visual symbols and perception, the communicative power of image is no less significant. This is especially notable since 60%\% of the sensory input received by our central nervous system comes from vision. Given the symbolic system inherent in images, we are curious whether images can also exhibit the laws of statistical linguistics. To explore this, we begin with the relationship between human thought and visual perception to decode how images are formed by the latter mechanism. Building upon previous studies that established the high correlation between pre-trained deep convolutional neural networks and the human visual system, we use the VGG-19 to define words via each kernel and calculate the number of pixels with grayscale values greater than 90%\%. By (a) ranking words frequency, (b) randomizing the order of kernel appearances and performing the same word count accumulation, and (c) summing the word counts layer by layer, we are surprised to find that Zipf's, Heaps', and Benford's laws of statistical linguistics also exist in the words that comprises the text representing different images.

arXiv.org

Membership Inference Attacks Against Vision-Language Models arxiv.org/abs/2501.18624 .CR .AI

Membership Inference Attacks Against Vision-Language Models

Vision-Language Models (VLMs), built on pre-trained vision encoders and large language models (LLMs), have shown exceptional multi-modal understanding and dialog capabilities, positioning them as catalysts for the next technological revolution. However, while most VLM research focuses on enhancing multi-modal interaction, the risks of data misuse and leakage have been largely unexplored. This prompts the need for a comprehensive investigation of such risks in VLMs. In this paper, we conduct the first analysis of misuse and leakage detection in VLMs through the lens of membership inference attack (MIA). In specific, we focus on the instruction tuning data of VLMs, which is more likely to contain sensitive or unauthorized information. To address the limitation of existing MIA methods, we introduce a novel approach that infers membership based on a set of samples and their sensitivity to temperature, a unique parameter in VLMs. Based on this, we propose four membership inference methods, each tailored to different levels of background knowledge, ultimately arriving at the most challenging scenario. Our comprehensive evaluations show that these methods can accurately determine membership status, e.g., achieving an AUC greater than 0.8 targeting a small set consisting of only 5 samples on LLaVA.

arXiv.org

Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels arxiv.org/abs/2501.17879 .AS .SP .IT .IT .AI .SD

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion arxiv.org/abs/2501.17887 .CL .CV .SE

VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning arxiv.org/abs/2501.17890 .SP .CV

ProcTex: Consistent and Interactive Text-to-texture Synthesis for Procedural Models arxiv.org/abs/2501.17895 .IV .GR

Explainable Machine Learning: An Illustration of Kolmogorov-Arnold Network Model for Airfoil Lift Prediction arxiv.org/abs/2501.17896 .LG

Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework arxiv.org/abs/2501.17903 .MA .AI

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models arxiv.org/abs/2501.17905 .LG .AI .CL

Unsupervised Patch-GAN with Targeted Patch Ranking for Fine-Grained Novelty Detection in Medical Imaging arxiv.org/abs/2501.17906 .IV .CV

Optimizing Carbon Footprint in ICT through Swarm Intelligence with Algorithmic Complexity arxiv.org/abs/2501.17166 .comp-ph .NE

Optimizing Carbon Footprint in ICT through Swarm Intelligence with Algorithmic Complexity

Global emissions from fossil fuel combustion and cement production were recorded in 2022, signaling a resurgence to pre-pandemic levels and providing an apodictic indication that emission peaks have not yet been achieved. Significant contributions to this upward trend are made by the Information and Communication Technology (ICT) industry due to its substantial energy consumption. This shows the need for further exploration of swarm intelligence applications to measure and optimize the carbon footprint within ICT. All causative factors are evaluated based on the quality of data collection; variations from each source are quantified; and an objective function related to carbon footprint in ICT energy management is optimized. Emphasis is placed on the asyndetic integration of data sources to construct a convex optimization problem. An apodictic necessity to prevent the erosion of accuracy in carbon footprint assessments is addressed. Complexity percentages ranged from 5.25% for the Bat Algorithm to 7.87% for Fast Bacterial Swarming, indicating significant fluctuations in resource intensity among algorithms. These findings suggest that we were able to quantify the environmental impact of various swarm algorithms.

arXiv.org

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks arxiv.org/abs/2501.17167 .SE .AI

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks

We introduce QualityFlow, a dynamic agentic workflow for program synthesis. Given the English description of a programming problem and a set of unit tests, the model's goal is to synthesize the correct program that solves the problem and passes the tests. QualityFlow consists of multiple large language model (LLM) agents that resemble a software development team, including code generation, testing, and self-debugging. Existing program synthesis methods face three major limitations: assumption of visible unit test conformity, bottleneck of synthesized test quality, and deviation of self-debugging trajectory. To address them, we propose the LLM Quality Checker, which explicitly "imagines" whether the synthesized programs' execution would conform to the unit tests. The Quality Checks dynamically control the workflow, including actions to submit the final answer, clarify the problem statement, and revert previous workflow steps. As a result, our Quality Checker can precisely accept any correct program, mitigate faulty synthesized tests, and prevent potential workflow deviation. The success of the Quality Checker further enables Diversified Prompting, which encourages variations in LLM responses to maximize the possibility that a correct program appears and passes the quality check. In experiments, QualityFlow establishes the state-of-the-art results on four program synthesis benchmarks: MBPP, HumanEval, and the stricter evaluations of both MBPP and HumanEval from EvalPlus. Our systematic analysis shows that the dynamic workflow controlled by LLM quality checks can outperform static workflows and single-attempt zero-shot synthesis. The Quality Checker is the center of our investigation, and we dissect its individual performance and integrated impact on the workflow accuracy, as well as other ablations experiments to justify our workflow design.

arXiv.org

EvoGP: A GPU-accelerated Framework for Tree-Based Genetic Programming arxiv.org/abs/2501.17168 .NE .AI

EvoGP: A GPU-accelerated Framework for Tree-Based Genetic Programming

Tree-based Genetic Programming (TGP) is a key evolutionary algorithm widely used in symbolic regression, feature engineering, and scientific modeling. Its high computational demands make GPU acceleration essential for scalable and high-performance evolutionary computation. However, GPU acceleration of TGP faces three key challenges: inefficient tree encoding, highly heterogeneous genetic operations, and limited parallelism in fitness evaluation. To address these challenges, we introduce EvoGP, a comprehensive GPU-accelerated TGP framework. First, we design a tensorized encoding scheme to represent tree with different structures as tensors with the same shape, optimizing memory access and enabling efficient parallel execution. Second, we propose a unified parallel framework for genetic operations by leveraging shared computational primitives and implementing dedicated CUDA kernels for scalable performance. Third, we present a fully parallel fitness evaluation strategy for symbolic regression, exploiting both population-level and data-level parallelism to maximize GPU utilization. Moreover, we implement a comprehensive library to provide rich algorithm operators and benchmark problems. EvoGP is extensively tested on various tasks, including symbolic regression, classification, and robotics control, demonstrating its versatility and effectiveness across diverse application scenarios. Experimental results show that EvoGP achieves up to a 140.89x speedup over the state-of-the-art GPU-based TGP implementation, while maintaining or exceeding the accuracy of baseline methods. EvoGP is open-source and accessible at: https://github.com/EMI-Group/evogp.

arXiv.org

Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes arxiv.org/abs/2501.17170 .NE .AI .CL .LG

Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes

In this paper, we evaluate the performance of four randomized optimization algorithms: Randomized Hill Climbing (RHC), Simulated Annealing (SA), Genetic Algorithms (GA), and MIMIC (Mutual Information Maximizing Input Clustering), across three distinct types of problems: binary, permutation, and combinatorial. We systematically compare these algorithms using a set of benchmark fitness functions that highlight the specific challenges and requirements of each problem category. Our study analyzes each algorithm's effectiveness based on key performance metrics, including solution quality, convergence speed, computational cost, and robustness. Results show that while MIMIC and GA excel in producing high-quality solutions for binary and combinatorial problems, their computational demands vary significantly. RHC and SA, while computationally less expensive, demonstrate limited performance in complex problem landscapes. The findings offer valuable insights into the trade-offs between different optimization strategies and provide practical guidance for selecting the appropriate algorithm based on the type of problems, accuracy requirements, and computational constraints.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.