Show newer

Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving arxiv.org/abs/2504.03651 .DC .AI .LG

Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving

Large language models have been widely deployed in various applications, encompassing both interactive online tasks and batched offline tasks. Given the burstiness and latency sensitivity of online tasks, over-provisioning resources is common practice. This allows for the integration of latency-insensitive offline tasks during periods of low online load, enhancing resource utilization. However, strategically serving online and offline tasks through a preemption mechanism fails to fully leverage the flexibility of offline tasks and suffers from KV cache recomputation and irregular workloads. In this paper, we introduce Echo, a collaborative online-offline task serving system, including a scheduler, a KV cache manager, and estimation toolkits. The scheduler and KV cache manager work tightly to maximize the throughput of offline tasks, while the estimator further predicts execution time to ensure online task SLOs. The scheduler leverages the batch information of last iteration to reduce the search space for finding the optimal schedule. The KV cache manager sets the priority of the KV cache based on the type of tasks and the opportunity of prefix sharing to reduce the recomputation. Finally, the estimation toolkits predict the execution time, future memory consumption, and the throughput of offline tasks to guide the scheduler, KV cache manager, and the system deployer. Evaluation based on real-world workloads demonstrates that Echo can increase offline task throughput by up to $3.3\times$, while satisfying online task SLOs.

arXiv.org

PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators arxiv.org/abs/2504.03654 .DC .AI .CV

PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators

Running deep learning models on resource-constrained edge devices has drawn significant attention due to its fast response, privacy preservation, and robust operation regardless of Internet connectivity. While these devices already cope with various intelligent tasks, the latest edge devices that are equipped with multiple types of low-power accelerators (i.e., both mobile GPU and NPU) can bring another opportunity; a task that used to be too heavy for an edge device in the single-accelerator world might become viable in the upcoming heterogeneous-accelerator world.To realize the potential in the context of 3D object detection, we identify several technical challenges and propose PointSplit, a novel 3D object detection framework for multi-accelerator edge devices that addresses the problems. Specifically, our PointSplit design includes (1) 2D semantics-aware biased point sampling, (2) parallelized 3D feature extraction, and (3) role-based group-wise quantization. We implement PointSplit on TensorFlow Lite and evaluate it on a customized hardware platform comprising both mobile GPU and EdgeTPU. Experimental results on representative RGB-D datasets, SUN RGB-D and Scannet V2, demonstrate that PointSplit on a multi-accelerator device is 24.7 times faster with similar accuracy compared to the full-precision, 2D-3D fusion-based 3D detector on a GPU-only device.

arXiv.org

Memory and Bandwidth are All You Need for Fully Sharded Data Parallel arxiv.org/abs/2504.03655 .DC .LG

Memory and Bandwidth are All You Need for Fully Sharded Data Parallel

Transformer models have revolutionized a wide spectrum of disciplines, especially in language processing. The recent success has proven that model size scalability is crucial for achieving superior performance metrics. However, training large transformer models is challenging even on modern hardware with powerful GPUs and high-speed interconnects. Existing studies primarily focus on optimizing model training distribution strategies to minimize memory footprint and enhance training speed, often overlooking the scalability challenges related to model size and hardware constraints. To address this oversight, we thoroughly investigate computational, memory, and network demands of training large transformers using the Fully Sharded Data Parallel (FSDP) distributed strategy across different hardware clusters. We explore the intricate relationships between model size and hardware setups to identify configurations that ensure maximum model and hardware efficiency, effective sequence length management, and optimal training throughput. A significant finding of our study is the critical interplay of the cluster's connection bandwidth and GPU memory size compared to the computational performance of GPUs. This interplay limits training efficiency, underscoring the role of both hardware characteristics as a possible bottleneck. By integrating theoretical analysis with simulations and empirical tests, we demonstrate how hardware limitations affect training efficacy, identifying key hardware thresholds and the impact of network connectivity. Our findings prompt a reassessment of training strategies guiding users on the way to finding hardware-optimal FSDP configurations, enhancing training efficiency for large-scale transformer models.

arXiv.org

Comparative Analysis of Lightweight Kubernetes Distributions for Edge Computing: Performance and Resource Efficiency arxiv.org/abs/2504.03656 .DC

Comparative Analysis of Lightweight Kubernetes Distributions for Edge Computing: Performance and Resource Efficiency

Edge computing environments increasingly rely on lightweight container orchestration platforms to manage resource-constrained devices. This paper provides an empirical analysis of five lightweight kubernetes distributions (KD)(k0s, k3s, KubeEdge, OpenYurt, and Kubernetes (k8s)) focusing on their performance and resource efficiency in edge computing scenarios. We evaluated key metrics such as CPU, memory, disk usage, throughput, and latency under varying workloads, utilizing a testbed of Intel NUCs and Raspberry Pi devices. Our results demonstrate significant differences in performance: k3s exhibited the lowest resource consumption, while k0s and k8s excelled in data plane throughput and latency. Under heavy stress scenarios, k3s and k0s accomplished the same workloads faster than the other distributions. OpenYurt offered balanced performance, suitable for hybrid cloud-edge use cases, but was less efficient in terms of resource usage and scalability compared to k0s, k3s and k8s. KubeEdge, although feature-rich for edge environments, exhibited higher resource consumption and lower scalability. These findings offer valuable insights for developers and operators selecting appropriate KD based on specific performance and resource efficiency requirements for edge computing environments.

arXiv.org

Curvature-Constrained Vector Field for Motion Planning of Nonholonomic Robots arxiv.org/abs/2504.02852 .SY .RO .SY

Curvature-Constrained Vector Field for Motion Planning of Nonholonomic Robots

Vector fields are advantageous in handling nonholonomic motion planning as they provide reference orientation for robots. However, additionally incorporating curvature constraints becomes challenging, due to the interconnection between the design of the curvature-bounded vector field and the tracking controller under underactuation. In this paper, we present a novel framework to co-develop the vector field and the control laws, guiding the nonholonomic robot to the target configuration with curvature-bounded trajectory. First, we formulate the problem by introducing the target positive limit set, which allows the robot to converge to or pass through the target configuration, depending on different dynamics and tasks. Next, we construct a curvature-constrained vector field (CVF) via blending and distributing basic flow fields in workspace and propose the saturated control laws with a dynamic gain, under which the tracking error's magnitude decreases even when saturation occurs. Under the control laws, kinematically constrained nonholonomic robots are guaranteed to track the reference CVF and converge to the target positive limit set with bounded trajectory curvature. Numerical simulations show that the proposed CVF method outperforms other vector-field-based algorithms. Experiments on Ackermann UGVs and semi-physical fixed-wing UAVs demonstrate that the method can be effectively implemented in real-world scenarios.

arXiv.org

Mapping Technological Futures: Anticipatory Discourse Through Text Mining arxiv.org/abs/2504.02853 .SI .CL .CY

Mapping Technological Futures: Anticipatory Discourse Through Text Mining

The volatility and unpredictability of emerging technologies, such as artificial intelligence (AI), generate significant uncertainty, which is widely discussed on social media. This study examines anticipatory discourse surrounding technological futures by analysing 1.5 million posts from 400 key opinion leaders (KOLs) published on the X platform (from 2021 to 2023). Using advanced text mining techniques, including BERTopic modelling, sentiment, emotion, and attitude analyses, the research identifies 100 distinct topics reflecting anticipated tech-driven futures. Our findings emphasize the dual role of KOLs in framing \textit{present futures} -- optimistic visions of transformative technologies like AI and IoT -- and influencing \textit{future presents}, where these projections shape contemporary societal and geopolitical debates. Positive emotions such as Hope dominate, outweighing Anxiety, particularly in topics like ``Machine Learning, Data Science, and Deep Learning,'' while discussions around ``Climate Change'' and ``War, Ukraine, and Trump People'' elicit \textit{Anxiety}. By framing technologies as solutions to societal challenges, KOLs act as mediators of societal narratives, bridging imagined futures and current realities. These insights underscore their pivotal role in directing public attention with emerging technologies during periods of heightened uncertainty, advancing our understanding of anticipatory discourse in technology-mediated contexts.

arXiv.org

The epistemic dimension of algorithmic fairness: assessing its impact in innovation diffusion and fair policy making arxiv.org/abs/2504.02856 .CY .SI

The epistemic dimension of algorithmic fairness: assessing its impact in innovation diffusion and fair policy making

Algorithmic fairness is an expanding field that addresses a range of discrimination issues associated with algorithmic processes. However, most works in the literature focus on analyzing it only from an ethical perspective, focusing on moral principles and values that should be considered in the design and evaluation of algorithms, while disregarding the epistemic dimension related to knowledge transmission and validation. However, this aspect of algorithmic fairness should also be included in the debate, as it is crucial to introduce a specific type of harm: an individual may be systematically excluded from the dissemination of knowledge due to the attribution of a credibility deficit/excess. In this work, we specifically focus on characterizing and analyzing the impact of this credibility deficit or excess on the diffusion of innovations on a societal scale, a phenomenon driven by individual attitudes and social interactions, and also by the strength of mutual connections. Indeed, discrimination might shape the latter, ultimately modifying how innovations spread within the network. In this light, to incorporate, also from a formal point of view, the epistemic dimension in innovation diffusion models becomes paramount, especially if these models are intended to support fair policy design. For these reasons, we formalize the epistemic properties of a social environment, by extending the well-established Linear Threshold Model (LTM) in an epistemic direction to show the impact of epistemic biases in innovation diffusion. Focusing on the impact of epistemic bias in both open-loop and closed-loop scenarios featuring optimal fostering policies, our results shed light on the pivotal role the epistemic dimension might have in the debate of algorithmic fairness in decision-making.

arXiv.org

Optimizing Humor Generation in Large Language Models: Temperature Configurations and Architectural Trade-offs arxiv.org/abs/2504.02858 .CL .LG

Optimizing Humor Generation in Large Language Models: Temperature Configurations and Architectural Trade-offs

Large language models (LLMs) demonstrate increasing capabilities in creative text generation, yet systematic evaluations of their humor production remain underexplored. This study presents a comprehensive analysis of 13 state-of-the-art LLMs across five architectural families, evaluating their performance in generating technically relevant humor for software developers. Through a full factorial design testing 715 unique configurations of temperature settings and prompt variations, we assess model outputs using five weighted criteria: humor quality, domain relevance, concept originality, tone precision, and delivery efficiency. Our methodology employs rigorous statistical analysis including ANOVA, correlation studies, and quadratic regression to identify optimal configurations and architectural influences. Results reveal significant performance variations across models, with certain architectures achieving 21.8% superiority over baseline systems. Temperature sensitivity analysis demonstrates that 73% of models achieve peak performance at lower stochasticity settings (<= 0.5), though optimal ranges vary substantially by architecture. We identify distinct model clusters: compact high-performers maintaining efficiency-quality balance versus verbose specialists requiring longer outputs for marginal gains. Statistical validation confirms model architecture explains 38.7% of performance variance, with significant correlations between humor quality and concept originality. The study establishes practical guidelines for model selection and configuration, demonstrating how temperature adjustments and architectural considerations impact humor generation effectiveness. These findings advance understanding of LLM capabilities in creative technical writing and provide empirically validated configuration strategies for developers implementing humor-generation systems.

arXiv.org

AI-Enhanced Resilience in Power Systems: Adversarial Deep Learning for Robust Short-Term Voltage Stability Assessment under Cyber-Attacks arxiv.org/abs/2504.02859 .SY .SY

AI-Enhanced Resilience in Power Systems: Adversarial Deep Learning for Robust Short-Term Voltage Stability Assessment under Cyber-Attacks

In the era of Industry 4.0, ensuring the resilience of cyber-physical systems against sophisticated cyber threats is increasingly critical. This study proposes a pioneering AI-based control framework that enhances short-term voltage stability assessments (STVSA) in power systems under complex composite cyber-attacks. First, by incorporating white-box and black-box adversarial attacks with Denial-of-Service (DoS) perturbations during training, composite adversarial attacks are implemented. Second, the application of Spectral Normalized Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (SNCWGAN-GP) and Fast Gradient Sign Method (FGSM) strengthens the model's resistance to adversarial disturbances, improving data quality and training stability. Third, an assessment model based on Long Short-Term Memory (LSTM)-enhanced Graph Attention Network (L-GAT) is developed to capture dynamic relationships between the post-fault dynamic trajectories and electrical grid topology. Experimental results on the IEEE 39-bus test system demonstrate the efficacy and superiority of the proposed method in composite cyber-attack scenarios. This contribution is pivotal to advancing AI-based resilient control strategies for nonlinear dynamical systems, marking a substantial enhancement in the security of cyber-physical systems.

arXiv.org

Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware Design arxiv.org/abs/2504.01962 .AR

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.