Show newer

SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting arxiv.org/abs/2409.05868 .CV

SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting

Recently, the 3D Gaussian Splatting (3D-GS) method has achieved great success in novel view synthesis, providing real-time rendering while ensuring high-quality rendering results. However, this method faces challenges in modeling specular reflections and handling anisotropic appearance components, especially in dealing with view-dependent color under complex lighting conditions. Additionally, 3D-GS uses spherical harmonic to learn the color representation, which has limited ability to represent complex scenes. To overcome these challenges, we introduce Lantent-SpecGS, an approach that utilizes a universal latent neural descriptor within each 3D Gaussian. This enables a more effective representation of 3D feature fields, including appearance and geometry. Moreover, two parallel CNNs are designed to decoder the splatting feature maps into diffuse color and specular color separately. A mask that depends on the viewpoint is learned to merge these two colors, resulting in the final rendered image. Experimental results demonstrate that our method obtains competitive performance in novel view synthesis and extends the ability of 3D-GS to handle intricate scenarios with specular reflections.

arxiv.org

Using Process Mining to Improve Digital Service Delivery arxiv.org/abs/2409.05869 .CY

Using Process Mining to Improve Digital Service Delivery

We present a case study of Process Mining (PM) for personnel security screening in the Canadian government. We consider customer (process time) and organizational (cost) perspectives. Furthermore, in contrast to most published case studies, we assess the full process improvement lifecycle: pre-intervention analyses pointed out initial bottlenecks, and post-intervention analyses identified the intervention impact and remaining areas for improvement. Using PM techniques, we identified frequent exceptional scenarios (e.g., applications requiring amendment), time-intensive loops (e.g., employees forgetting tasks), and resource allocation issues (e.g., involvement of non-security personnel). Subsequent process improvement interventions, implemented using a flexible low-code digital platform, reduced security briefing times from around 7 days to 46 hours, and overall process time from around 31 days to 26 days, on average. From a cost perspective, the involvement of hiring managers and security screening officers was significantly reduced. These results demonstrate how PM can become part of a broader digital transformation framework to improve public service delivery. The success of these interventions motivated subsequent government PM projects, and inspired a PM methodology, currently under development, for use in large organizational contexts such as governments.

arxiv.org

Enabling Distributed Generative Artificial Intelligence in 6G: Mobile Edge Generation arxiv.org/abs/2409.05870 .IT .IT

Enabling Distributed Generative Artificial Intelligence in 6G: Mobile Edge Generation

Mobile edge generation (MEG) is an emerging technology that allows the network to meet the challenging traffic load expectations posed by the rise of generative artificial intelligence~(GAI). A novel MEG model is proposed for deploying GAI models on edge servers (ES) and user equipment~(UE) to jointly complete text-to-image generation tasks. In the generation task, the ES and UE will cooperatively generate the image according to the text prompt given by the user. To enable the MEG, a pre-trained latent diffusion model (LDM) is invoked to generate the latent feature, and an edge-inferencing MEG protocol is employed for data transmission exchange between the ES and the UE. A compression coding technique is proposed for compressing the latent features to produce seeds. Based on the above seed-enabled MEG model, an image quality optimization problem with transmit power constraint is formulated. The transmitting power of the seed is dynamically optimized by a deep reinforcement learning agent over the fading channel. The proposed MEG enabled text-to-image generation system is evaluated in terms of image quality and transmission overhead. The numerical results indicate that, compared to the conventional centralized generation-and-downloading scheme, the symbol number of the transmission of MEG is materially reduced. In addition, the proposed compression coding approach can improve the quality of generated images under low signal-to-noise ratio (SNR) conditions.

arxiv.org

Multi-feature Compensatory Motion Analysis for Reaching Motions Over a Discretely Sampled Workspace arxiv.org/abs/2409.05871 .RO .LG

Multi-feature Compensatory Motion Analysis for Reaching Motions Over a Discretely Sampled Workspace

The absence of functional arm joints, such as the wrist, in upper extremity prostheses leads to compensatory motions in the users' daily activities. Compensatory motions have been previously studied for varying task protocols and evaluation metrics. However, the movement targets' spatial locations in previous protocols were not standardised and incomparable between studies, and the evaluation metrics were rudimentary. This work analysed compensatory motions in the final pose of subjects reaching across a discretely sampled 7*7 2D grid of targets under unbraced (normative) and braced (compensatory) conditions. For the braced condition, a bracing system was applied to simulate a transradial prosthetic limb by restricting participants' wrist joints. A total of 1372 reaching poses were analysed, and a Compensation Index was proposed to indicate the severity level of compensation. This index combined joint spatial location analysis, joint angle analysis, separability analysis, and machine learning (clustering) analysis. The individual analysis results and the final Compensation Index were presented in heatmap format to correspond to the spatial layout of the workspace, revealing the spatial dependency of compensatory motions. The results indicate that compensatory motions occur mainly in a right trapezoid region in the upper left area and a vertical trapezoid region in the middle left area for right-handed subjects reaching horizontally and vertically. Such results might guide motion selection in clinical rehabilitation, occupational therapy, and prosthetic evaluation to help avoid residual limb pain and overuse syndromes.

arxiv.org

CSRec: Rethinking Sequential Recommendation from A Causal Perspective arxiv.org/abs/2409.05872 .IR .LG

CSRec: Rethinking Sequential Recommendation from A Causal Perspective

The essence of sequential recommender systems (RecSys) lies in understanding how users make decisions. Most existing approaches frame the task as sequential prediction based on users' historical purchase records. While effective in capturing users' natural preferences, this formulation falls short in accurately modeling actual recommendation scenarios, particularly in accounting for how unsuccessful recommendations influence future purchases. Furthermore, the impact of the RecSys itself on users' decisions has not been appropriately isolated and quantitatively analyzed. To address these challenges, we propose a novel formulation of sequential recommendation, termed Causal Sequential Recommendation (CSRec). Instead of predicting the next item in the sequence, CSRec aims to predict the probability of a recommended item's acceptance within a sequential context and backtrack how current decisions are made. Critically, CSRec facilitates the isolation of various factors that affect users' final decisions, especially the influence of the recommender system itself, thereby opening new avenues for the design of recommender systems. CSRec can be seamlessly integrated into existing methodologies. Experimental evaluations on both synthetic and real-world datasets demonstrate that the proposed implementation significantly improves upon state-of-the-art baselines.

arxiv.org

Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars arxiv.org/abs/2409.05874 .AP .CE

Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars

The Mars Perseverance Rover represents a generational change in the scale of measurements that can be taken on Mars, however this increased resolution introduces new challenges for techniques in exploratory data analysis. The multiple different instruments on the rover each measures specific properties of interest to scientists, so analyzing how underlying phenomena affect multiple different instruments together is important to understand the full picture. However each instrument has a unique resolution, making the mapping between overlapping layers of data non-trivial. In this work, we introduce Nested Fusion, a method to combine arbitrarily layered datasets of different resolutions and produce a latent distribution at the highest possible resolution, encoding complex interrelationships between different measurements and scales. Our method is efficient for large datasets, can perform inference even on unseen data, and outperforms existing methods of dimensionality reduction and latent analysis on real-world Mars rover data. We have deployed our method Nested Fusion within a Mars science team at NASA Jet Propulsion Laboratory (JPL) and through multiple rounds of participatory design enabled greatly enhanced exploratory analysis workflows for real scientists. To ensure the reproducibility of our work we have open sourced our code on GitHub at https://github.com/pixlise/NestedFusion.

arxiv.org

Transformer-Enhanced Iterative Feedback Mechanism for Polyp Segmentation arxiv.org/abs/2409.05875 .CV

Transformer-Enhanced Iterative Feedback Mechanism for Polyp Segmentation

Colorectal cancer (CRC) is the third most common cause of cancer diagnosed in the United States and the second leading cause of cancer-related death among both genders. Notably, CRC is the leading cause of cancer in younger men less than 50 years old. Colonoscopy is considered the gold standard for the early diagnosis of CRC. Skills vary significantly among endoscopists, and a high miss rate is reported. Automated polyp segmentation can reduce the missed rates, and timely treatment is possible in the early stage. To address this challenge, we introduce \textit{\textbf{\ac{FANetv2}}}, an advanced encoder-decoder network designed to accurately segment polyps from colonoscopy images. Leveraging an initial input mask generated by Otsu thresholding, FANetv2 iteratively refines its binary segmentation masks through a novel feedback attention mechanism informed by the mask predictions of previous epochs. Additionally, it employs a text-guided approach that integrates essential information about the number (one or many) and size (small, medium, large) of polyps to further enhance its feature representation capabilities. This dual-task approach facilitates accurate polyp segmentation and aids in the auxiliary classification of polyp attributes, significantly boosting the model's performance. Our comprehensive evaluations on the publicly available BKAI-IGH and CVC-ClinicDB datasets demonstrate the superior performance of FANetv2, evidenced by high dice similarity coefficients (DSC) of 0.9186 and 0.9481, along with low Hausdorff distances of 2.83 and 3.19, respectively. The source code for FANetv2 is available at https://github.com/xxxxx/FANetv2.

arxiv.org

The Impact of Virtual Achievements on Online Learning Applications arxiv.org/abs/2409.05877 .HC

The Impact of Virtual Achievements on Online Learning Applications

In recent times a number of platforms are using badge-based achievements or leaderboards to increase user involvement and participation. Due to recent advancements, there is a question of up to what extent virtual achievement systems have on users using particular platforms. Here in this paper, we discuss measuring the impact of the leaderboard-based achievement system by integrating it into an online learning android application UPSC Pre that has thousands of questions and answers categorized topic wise related to UPSC exams, one of the toughest exams to crack in the world. We are conducting the experiment on 10 randomly chosen students who are using the app in a controlled setting and the data measurement is done using the Firebase Analytics tool by Google. We observed that the students using the leaderboard have increased participation without any reductions in their quality and the time of using the platform has increased compared to previous engagements. Students who participated in the experiment felt the leaderboard was competitive and enjoyed gaining positions on the leaderboard and wanted it in the User interface for other platforms as well. The research has an impact in designing the learning applications to cater and improve the user experience in the future. The results are not limited to educational purposes but can be expanded to other fields such as self development applications, other research projects, Gaming industry and many more.

arxiv.org

CF-KAN: Kolmogorov-Arnold Network-based Collaborative Filtering to Mitigate Catastrophic Forgetting in Recommender Systems arxiv.org/abs/2409.05878 .IR .LG

CF-KAN: Kolmogorov-Arnold Network-based Collaborative Filtering to Mitigate Catastrophic Forgetting in Recommender Systems

Collaborative filtering (CF) remains essential in recommender systems, leveraging user--item interactions to provide personalized recommendations. Meanwhile, a number of CF techniques have evolved into sophisticated model architectures based on multi-layer perceptrons (MLPs). However, MLPs often suffer from catastrophic forgetting, and thus lose previously acquired knowledge when new information is learned, particularly in dynamic environments requiring continual learning. To tackle this problem, we propose CF-KAN, a new CF method utilizing Kolmogorov-Arnold networks (KANs). By learning nonlinear functions on the edge level, KANs are more robust to the catastrophic forgetting problem than MLPs. Built upon a KAN-based autoencoder, CF-KAN is designed in the sense of effectively capturing the intricacies of sparse user--item interactions and retaining information from previous data instances. Despite its simplicity, our extensive experiments demonstrate 1) CF-KAN's superiority over state-of-the-art methods in recommendation accuracy, 2) CF-KAN's resilience to catastrophic forgetting, underscoring its effectiveness in both static and dynamic recommendation scenarios, and 3) CF-KAN's edge-level interpretation facilitating the explainability of recommendations.

arxiv.org

Hierarchical Optimal Dispatch of Active Distribution Networks Considering Flexibility Auxiliary Service of Multi-community Integrated Energy Systems arxiv.org/abs/2409.04446 .SY .SY

Hierarchical Optimal Dispatch of Active Distribution Networks Considering Flexibility Auxiliary Service of Multi-community Integrated Energy Systems

Active distribution networks (ADNs) are the main platforms for carrying large-scale distributed renewable energy and flexible resources, and multi-community integrated energy systems (MCIESs) may become important flexible resource supplies in ADNs owing to their multi-energy synergistic and complementary advantages. To fully utilize the flexible regulation potential of MCIESs for ADNs, a novel hierarchical stochastic dispatch approach for ADNs that considers flexibility auxiliary services of MCIESs is proposed. In this approach, a flexibility auxiliary service pricing strategy that combines adjustment cost and flexibility margin is established by evaluating the operational flexibility of MCIESs. In addition, considering renewable uncertainty, an MCIES-ADN flexibility interaction mechanism based on insufficient flexibility risk is designed to optimize their operation strategies and reduce the uncertainty risk. In the solution phase, an analytical target cascading theory-based distributed solving method is developed to realize decoupling and parallel solving of multiple stakeholders. The simulation results for a PG&E 69-node system with three CIESs demonstrate that the proposed approach not only improves MCIES revenue but also enhances ADN flexibility to consume renewable energy, which provides a fundamental way for efficient application of regional mutual aid.

arxiv.org

New parametric identification method for a preference model arxiv.org/abs/2409.04462 .SY .CE .SY

New parametric identification method for a preference model

This article presents a contribution to multi-criteria decision support intended for industrial decision-makers in order to determine the best compromise between design criteria when working on risky or innovative products. In (RENAUD et al. 2008) we used the OWA operator (Ordered Weighted Average), a well-known multi-criteria analysis technique introduced by (YAGER 1988). The interest of this aggregation method is, beyond its ease of use, its ability to evaluate a product according to a single scale. When using the OWA method, the choice of criterion weights and their values remains an important and delicate operation. Indeed, the weights are not fixed by criterion but according to a level of utility (FISHBURN 1967). The weights can, in addition, be determined using different methods. A traditional approach consists in estimating the weights from an a priori selection of the most representative samples by an expert. We propose a new approach based on applied D-Optimality to determine the best sample. The results of the two approaches are compared. Indeed, in some multi-criteria decision problems, it is difficult to choose the method that best describes the behavior of the decision maker. As said above, the first part of this paper presents an original method based on D-optimality to determine the most representative set of samples to determine the decision parameters. This method has been applied and validated in an OWA approach. It has been shown that the accuracy and reliability of the method have been improved. Since the OWA application has been improved, the other goal of this paper is to verify whether the real decision maker has an OWA-based behavior or not. To achieve this, the D-optimality approach is applied to the MAUT (Multi-Attribute Utility Theory) approach to find the best set of samples. The results of the two methods are compared. Both approaches show that, while OWA better simulates the decision maker's behavior, there is still a gap between its scores and that of OWA. The question here is how to improve the accuracy of the score estimated by OWA compared to those given by the expert. Subsequently, a hybrid approach is tested, such as a linear combination of the MAUT and OWA approaches. The results obtained with this combination were found to be more accurate than those of OWA. However, we also tested the CHOQUET discrete integral approach. In other words, it is possible to find a model capable of better describing the decision maker's behavior.

arxiv.org

Leveraging Large Language Models for Solving Rare MIP Challenges arxiv.org/abs/2409.04464 .OC .CL .AI .LG

Leveraging Large Language Models for Solving Rare MIP Challenges

Mixed Integer Programming (MIP) has been extensively applied in areas requiring mathematical solvers to address complex instances within tight time constraints. However, as the problem scale increases, the complexity of model formulation and finding feasible solutions escalates significantly. In contrast, the model-building cost for end-to-end models, such as large language models (LLMs), remains largely unaffected by problem scale due to their pattern recognition capabilities. While LLMs, like GPT-4, without fine-tuning, can handle some traditional medium-scale MIP problems, they struggle with uncommon or highly specialized MIP scenarios. Fine-tuning LLMs can yield some feasible solutions for medium-scale MIP instances, but these models typically fail to explore diverse solutions when constrained by a low and constant temperature, limiting their performance. In this paper, we propose and evaluate a recursively dynamic temperature method integrated with a chain-of-thought approach. Our findings show that starting with a high temperature and gradually lowering it leads to better feasible solutions compared to other dynamic temperature strategies. Additionally, by comparing results generated by the LLM with those from Gurobi, we demonstrate that the LLM can produce solutions that complement traditional solvers by accelerating the pruning process and improving overall efficiency.

arxiv.org

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs arxiv.org/abs/2409.04465 .AI

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs

This paper presents our research towards a near-term future in which legal entities, such as individuals and organisations can entrust semi-autonomous AI-driven agents to carry out online interactions on their behalf. The author's research concerns the development of semi-autonomous Web agents, which consult users if and only if the system does not have sufficient context or confidence to proceed working autonomously. This creates a user-agent dialogue that allows the user to teach the agent about the information sources they trust, their data-sharing preferences, and their decision-making preferences. Ultimately, this enables the user to maximise control over their data and decisions while retaining the convenience of using agents, including those driven by LLMs. In view of developing near-term solutions, the research seeks to answer the question: "How do we build a trustworthy and reliable network of semi-autonomous agents which represent individuals and organisations on the Web?". After identifying key requirements, the paper presents a demo for a sample use case of a generic personal assistant. This is implemented using (Notation3) rules to enforce safety guarantees around belief, data sharing and data usage and LLMs to allow natural language interaction with users and serendipitous dialogues between software agents.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.