arXiv Computer Science @arxiv_cs@qoto.org

Three Laws of Statistical Linguistics Emerging in images

Three Laws of Statistical Linguistics Emerging in images https://arxiv.org/abs/2501.18620 #physics.comp-ph #cs.CV

Images, as a product evolving alongside civilization, develop similarly to natural languages with the advancement of civilization. Not only are images abundant in daily life, but are also influenced by technology in shaping their forms, embodying various characteristics as they evolve in time. Language is a sequence of symbols that represents thoughts. While a written language is typically associated with the close integration of text and sound, as a combination of visual symbols and perception, the communicative power of image is no less significant. This is especially notable since 60%\% of the sensory input received by our central nervous system comes from vision. Given the symbolic system inherent in images, we are curious whether images can also exhibit the laws of statistical linguistics. To explore this, we begin with the relationship between human thought and visual perception to decode how images are formed by the latter mechanism. Building upon previous studies that established the high correlation between pre-trained deep convolutional neural networks and the human visual system, we use the VGG-19 to define words via each kernel and calculate the number of pixels with grayscale values greater than 90%\%. By (a) ranking words frequency, (b) randomizing the order of kernel appearances and performing the same word count accumulation, and (c) summing the word counts layer by layer, we are surprised to find that Zipf's, Heaps', and Benford's laws of statistical linguistics also exist in the words that comprises the text representing different images.

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 4

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 4

Feb 4

VLMaterial: Procedural Material Generation with Large Vision-Language Models

VLMaterial: Procedural Material Generation with Large Vision-Language Models https://arxiv.org/abs/2501.18623 #cs.CV #cs.GR

Procedural materials, represented as functional node graphs, are ubiquitous in computer graphics for photorealistic material appearance design. They allow users to perform intuitive and precise editing to achieve desired visual appearances. However, creating a procedural material given an input image requires professional knowledge and significant effort. In this work, we leverage the ability to convert procedural materials into standard Python programs and fine-tune a large pre-trained vision-language model (VLM) to generate such programs from input images. To enable effective fine-tuning, we also contribute an open-source procedural material dataset and propose to perform program-level augmentation by prompting another pre-trained large language model (LLM). Through extensive evaluation, we show that our method outperforms previous methods on both synthetic and real-world examples.

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 4

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 4

Feb 4

Membership Inference Attacks Against Vision-Language Models

Membership Inference Attacks Against Vision-Language Models https://arxiv.org/abs/2501.18624 #cs.CR #cs.AI

Vision-Language Models (VLMs), built on pre-trained vision encoders and large language models (LLMs), have shown exceptional multi-modal understanding and dialog capabilities, positioning them as catalysts for the next technological revolution. However, while most VLM research focuses on enhancing multi-modal interaction, the risks of data misuse and leakage have been largely unexplored. This prompts the need for a comprehensive investigation of such risks in VLMs. In this paper, we conduct the first analysis of misuse and leakage detection in VLMs through the lens of membership inference attack (MIA). In specific, we focus on the instruction tuning data of VLMs, which is more likely to contain sensitive or unauthorized information. To address the limitation of existing MIA methods, we introduce a novel approach that infers membership based on a set of samples and their sensitivity to temperature, a unique parameter in VLMs. Based on this, we propose four membership inference methods, each tailored to different levels of background knowledge, ultimately arriving at the most challenging scenario. Our comprehensive evaluations show that these methods can accurately determine membership status, e.g., achieving an AUC greater than 0.8 targeting a small set consisting of only 5 samples on LLaVA.

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels https://arxiv.org/abs/2501.17879 #eess.AS #eess.SP #math.IT #cs.IT #cs.AI #cs.SD

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion https://arxiv.org/abs/2501.17887 #cs.CL #cs.CV #cs.SE

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning https://arxiv.org/abs/2501.17890 #eess.SP #cs.CV

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

ProcTex: Consistent and Interactive Text-to-texture Synthesis for Procedural Models https://arxiv.org/abs/2501.17895 #eess.IV #cs.GR

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Explainable Machine Learning: An Illustration of Kolmogorov-Arnold Network Model for Airfoil Lift Prediction https://arxiv.org/abs/2501.17896 #cs.LG

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

The Right to AI https://arxiv.org/abs/2501.17899 #cs.CY #cs.AI #cs.HC

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Shared DIFF Transformer https://arxiv.org/abs/2501.17900 #cs.LG

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework https://arxiv.org/abs/2501.17903 #cs.MA #cs.AI

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models https://arxiv.org/abs/2501.17905 #cs.LG #cs.AI #cs.CL

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

**arXiv Computer Science** @arxiv_cs@qoto.org · Feb 1

Unsupervised Patch-GAN with Targeted Patch Ranking for Fine-Grained Novelty Detection in Medical Imaging https://arxiv.org/abs/2501.17906 #eess.IV #cs.CV

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

Split Knowledge Distillation for Large Models in IoT: Architecture, Challenges, and Solutions

Split Knowledge Distillation for Large Models in IoT: Architecture, Challenges, and Solutions https://arxiv.org/abs/2501.17164 #cs.LG #cs.AI

Large models (LMs) have immense potential in Internet of Things (IoT) systems, enabling applications such as intelligent voice assistants, predictive maintenance, and healthcare monitoring. However, training LMs on edge servers raises data privacy concerns, while deploying them directly on IoT devices is constrained by limited computational and memory resources. We analyze the key challenges of training LMs in IoT systems, including energy constraints, latency requirements, and device heterogeneity, and propose potential solutions such as dynamic resource management, adaptive model partitioning, and clustered collaborative training. Furthermore, we propose a split knowledge distillation framework to efficiently distill LMs into smaller, deployable versions for IoT devices while ensuring raw data remains local. This framework integrates knowledge distillation and split learning to minimize energy consumption and meet low model training delay requirements. A case study is presented to evaluate the feasibility and performance of the proposed framework.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

Optimizing Carbon Footprint in ICT through Swarm Intelligence with Algorithmic Complexity

Optimizing Carbon Footprint in ICT through Swarm Intelligence with Algorithmic Complexity https://arxiv.org/abs/2501.17166 #physics.comp-ph #cs.NE

Global emissions from fossil fuel combustion and cement production were recorded in 2022, signaling a resurgence to pre-pandemic levels and providing an apodictic indication that emission peaks have not yet been achieved. Significant contributions to this upward trend are made by the Information and Communication Technology (ICT) industry due to its substantial energy consumption. This shows the need for further exploration of swarm intelligence applications to measure and optimize the carbon footprint within ICT. All causative factors are evaluated based on the quality of data collection; variations from each source are quantified; and an objective function related to carbon footprint in ICT energy management is optimized. Emphasis is placed on the asyndetic integration of data sources to construct a convex optimization problem. An apodictic necessity to prevent the erosion of accuracy in carbon footprint assessments is addressed. Complexity percentages ranged from 5.25% for the Bat Algorithm to 7.87% for Fast Bacterial Swarming, indicating significant fluctuations in resource intensity among algorithms. These findings suggest that we were able to quantify the environmental impact of various swarm algorithms.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks https://arxiv.org/abs/2501.17167 #cs.SE #cs.AI

We introduce QualityFlow, a dynamic agentic workflow for program synthesis. Given the English description of a programming problem and a set of unit tests, the model's goal is to synthesize the correct program that solves the problem and passes the tests. QualityFlow consists of multiple large language model (LLM) agents that resemble a software development team, including code generation, testing, and self-debugging. Existing program synthesis methods face three major limitations: assumption of visible unit test conformity, bottleneck of synthesized test quality, and deviation of self-debugging trajectory. To address them, we propose the LLM Quality Checker, which explicitly "imagines" whether the synthesized programs' execution would conform to the unit tests. The Quality Checks dynamically control the workflow, including actions to submit the final answer, clarify the problem statement, and revert previous workflow steps. As a result, our Quality Checker can precisely accept any correct program, mitigate faulty synthesized tests, and prevent potential workflow deviation. The success of the Quality Checker further enables Diversified Prompting, which encourages variations in LLM responses to maximize the possibility that a correct program appears and passes the quality check. In experiments, QualityFlow establishes the state-of-the-art results on four program synthesis benchmarks: MBPP, HumanEval, and the stricter evaluations of both MBPP and HumanEval from EvalPlus. Our systematic analysis shows that the dynamic workflow controlled by LLM quality checks can outperform static workflows and single-attempt zero-shot synthesis. The Quality Checker is the center of our investigation, and we dissect its individual performance and integrated impact on the workflow accuracy, as well as other ablations experiments to justify our workflow design.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

EvoGP: A GPU-accelerated Framework for Tree-Based Genetic Programming

EvoGP: A GPU-accelerated Framework for Tree-Based Genetic Programming https://arxiv.org/abs/2501.17168 #cs.NE #cs.AI

Tree-based Genetic Programming (TGP) is a key evolutionary algorithm widely used in symbolic regression, feature engineering, and scientific modeling. Its high computational demands make GPU acceleration essential for scalable and high-performance evolutionary computation. However, GPU acceleration of TGP faces three key challenges: inefficient tree encoding, highly heterogeneous genetic operations, and limited parallelism in fitness evaluation. To address these challenges, we introduce EvoGP, a comprehensive GPU-accelerated TGP framework. First, we design a tensorized encoding scheme to represent tree with different structures as tensors with the same shape, optimizing memory access and enabling efficient parallel execution. Second, we propose a unified parallel framework for genetic operations by leveraging shared computational primitives and implementing dedicated CUDA kernels for scalable performance. Third, we present a fully parallel fitness evaluation strategy for symbolic regression, exploiting both population-level and data-level parallelism to maximize GPU utilization. Moreover, we implement a comprehensive library to provide rich algorithm operators and benchmark problems. EvoGP is extensively tested on various tasks, including symbolic regression, classification, and robotics control, demonstrating its versatility and effectiveness across diverse application scenarios. Experimental results show that EvoGP achieves up to a 140.89x speedup over the state-of-the-art GPU-based TGP implementation, while maintaining or exceeding the accuracy of baseline methods. EvoGP is open-source and accessible at: https://github.com/EMI-Group/evogp.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes

Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes https://arxiv.org/abs/2501.17170 #cs.NE #cs.AI #cs.CL #cs.LG

In this paper, we evaluate the performance of four randomized optimization algorithms: Randomized Hill Climbing (RHC), Simulated Annealing (SA), Genetic Algorithms (GA), and MIMIC (Mutual Information Maximizing Input Clustering), across three distinct types of problems: binary, permutation, and combinatorial. We systematically compare these algorithms using a set of benchmark fitness functions that highlight the specific challenges and requirements of each problem category. Our study analyzes each algorithm's effectiveness based on key performance metrics, including solution quality, convergence speed, computational cost, and robustness. Results show that while MIMIC and GA excel in producing high-quality solutions for binary and combinatorial problems, their computational demands vary significantly. RHC and SA, while computationally less expensive, demonstrate limited performance in complex problem landscapes. The findings offer valuable insights into the trade-offs between different optimization strategies and provide practical guidance for selecting the appropriate algorithm based on the type of problems, accuracy requirements, and computational constraints.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning

Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning https://arxiv.org/abs/2501.17171 #eess.IV #cs.CV #cs.AI #cs.LG

Compositional Zero-Shot Learning (CZSL) aims to recognize subtle differences in meaning or the combination of states and objects through the use of known and unknown concepts during training. Existing methods either focused on prompt configuration or on using prompts to tune the pre-trained Vision-Language model. However, these methods faced challenges in accurately identifying subtle differences in meaning or combining states with objects. To jointly eradicate the above issues and construct an efficient and effective CZSL technique, we suggest a method to improve attribute recognition performance by utilizing diverse Prompt Learning with an Inter/Intra-Modality Fusion Synthesizer in scene understanding involving subtle semantic differences and multiple objects.

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

**arXiv Computer Science** @arxiv_cs@qoto.org · Jan 31

Towards spiking analog hardware implementation of a trajectory interpolation mechanism for smooth closed-loop control of a spiking robot arm

Towards spiking analog hardware implementation of a trajectory interpolation mechanism for smooth closed-loop control of a spiking robot arm https://arxiv.org/abs/2501.17172 #cs.NE #cs.RO

Neuromorphic engineering aims to incorporate the computational principles found in animal brains, into modern technological systems. Following this approach, in this work we propose a closed-loop neuromorphic control system for an event-based robotic arm. The proposed system consists of a shifted Winner-Take-All spiking network for interpolating a reference trajectory and a spiking comparator network responsible for controlling the flow continuity of the trajectory, which is fed back to the actual position of the robot. The comparator model is based on a differential position comparison neural network, which governs the execution of the next trajectory points to close the control loop between both components of the system. To evaluate the system, we implemented and deployed the model on a mixed-signal analog-digital neuromorphic platform, the DYNAP-SE2, to facilitate integration and communication with the ED-Scorbot robotic arm platform. Experimental results on one joint of the robot validate the use of this architecture and pave the way for future neuro-inspired control of the entire robot.