arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals. (arXiv:2311.17923v1 [eess.AS]) http://arxiv.org/abs/2311.17923

Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals

Recent advances in brain-computer interface (BCI) technology, particularly based on generative adversarial networks (GAN), have shown great promise for improving decoding performance for BCI. Within the realm of Brain-Computer Interfaces (BCI), GANs find application in addressing many areas. They serve as a valuable tool for data augmentation, which can solve the challenge of limited data availability, and synthesis, effectively expanding the dataset and creating novel data formats, thus enhancing the robustness and adaptability of BCI systems. Research in speech-related paradigms has significantly expanded, with a critical impact on the advancement of assistive technologies and communication support for individuals with speech impairments. In this study, GANs were investigated, particularly for the BCI field, and applied to generate text from EEG signals. The GANs could generalize all subjects and decode unseen words, indicating its ability to capture underlying speech patterns consistent across different individuals. The method has practical applications in neural signal-based speech recognition systems and communication aids for individuals with speech difficulties.

arXiv Computer Science @arxiv_cs@qoto.org

Unrolling Virtual Worlds for Immersive Experiences. (arXiv:2311.17924v1 [cs.GR]) http://arxiv.org/abs/2311.17924

Unrolling Virtual Worlds for Immersive Experiences

This research pioneers a method for generating immersive worlds, drawing inspiration from elements of vintage adventure games like Myst and employing modern text-to-image models. We explore the intricate conversion of 2D panoramas into 3D scenes using equirectangular projections, addressing the distortions in perception that occur as observers navigate within the encompassing sphere. Our approach employs a technique similar to "inpainting" to rectify distorted projections, enabling the smooth construction of locally coherent worlds. This provides extensive insight into the interrelation of technology, perception, and experiential reality within human-computer interaction.

arXiv Computer Science @arxiv_cs@qoto.org

Grid-Forming Control of Power Converters: Equivalence Proof through Simplified Models. (arXiv:2311.17926v1 [eess.SY]) http://arxiv.org/abs/2311.17926

Grid-Forming Control of Power Converters: Equivalence Proof through Simplified Models

This work establishes the equivalence of selected grid-forming control algorithms within the context of simplified theoretical models. Considered algorithms are droop control, Virtual Synchronous Machine (VSM) and matching control. It is shown that nodal and network dynamics under those regulators boil down to the same equations near the selected (trivial) nominal operating point. Finally, some practical insights on each regulator dynamics and an outlook are provided.

arXiv Computer Science @arxiv_cs@qoto.org

Closed-Loop Ramp-Comparison Current Regulator for an Induction Machine with a PWM Voltage-Source Inverter. (arXiv:2311.17927v1 [eess.SY]) http://arxiv.org/abs/2311.17927

Closed-Loop Ramp-Comparison Current Regulator for an Induction Machine with a PWM Voltage-Source Inverter

This paper addresses the closed-loop ramp comparison current regulation in an induction machine fed by a pulse width modulated voltage source inverter. The regulator is implemented in a synchronous frame, serving as a foundation for an overarching vector control of the induction machine. First, the effect of PI regulator gains on the controller performance is analyzed both theoretically and numerically using the developed Simulink model of the system. Next, the paper deals with high speed and/or low-voltage operating conditions of the machine, introducing the concept of overmodulation and analyzing its impact on the regulator performance. Obtained simulation results coincide with model-based theoretical predictions and literature findings. Finally, the work proposes an outlook for the high-speed system enhancements in terms of power electronics topology, control and modulation.

arXiv Computer Science @arxiv_cs@qoto.org

New Online Communities: Graph Deep Learning on Anonymous Voting Networks to Identify Sybils in Polycentric Governance. (arXiv:2311.17929v1 [cs.LG]) http://arxiv.org/abs/2311.17929

New Online Communities: Graph Deep Learning on Anonymous Voting Networks to Identify Sybils in Polycentric Governance

This research examines the polycentric governance of digital assets in Decentralized Autonomous Organizations (DAOs). It offers a theoretical framework and addresses a critical challenge facing decentralized governance by developing a method to identify sybils, or spurious identities. The method uses graph deep learning techniques to identify sybil activity in a DAO governance dataset (snapshot.org). Specifically, a Graph Convolutional Neural Network (GCNN) learned voting behaviours and a fast k-means vector clustering algorithm (FAISS) used the high dimensional embeddings to identify similar nodes in a graph. The results reveal that deep learning can effectively identify sybils, reducing the voting graph by 2-5%. This research underscores the importance of sybil resistance in DAOs and offers a novel perspective on decentralized governance, informing future policy, regulation, and governance practices.

arXiv Computer Science @arxiv_cs@qoto.org

Model Theory of Ultrafinitism II: Deconstructing the Term Model (First Draft). (arXiv:2311.17931v1 [math.LO]) http://arxiv.org/abs/2311.17931

Model Theory of Ultrafinitism II: Deconstructing the Term Model (First Draft)

This paper presents a novel possible worlds semantics, designed to elucidate the underpinnings of ultrafinitism. By constructing a careful modification of the well-known Kripke models for inuitionistic logic, we seek to extend our comprehension of the ultra-finite mindset. As it turns out, the passage from standard constructivist mathematics to the ultrafinite is in a sense an operation of deconstruction of familiar mathematical entities, most notably clear when it comes to N.

arXiv Computer Science @arxiv_cs@qoto.org

Generating Molecular Conformer Fields. (arXiv:2311.17932v1 [physics.chem-ph]) http://arxiv.org/abs/2311.17932

Generating Molecular Conformer Fields

In this paper we tackle the problem of generating conformers of a molecule in 3D space given its molecular graph. We parameterize these conformers as continuous functions that map elements from the molecular graph to points in 3D space. We then formulate the problem of learning to generate conformers as learning a distribution over these functions using a diffusion generative model, called Molecular Conformer Fields (MCF). Our approach is simple and scalable, and achieves state-of-the-art performance on challenging molecular conformer generation benchmarks while making no assumptions about the explicit structure of molecules (e.g. modeling torsional angles). MCF represents an advance in extending diffusion models to handle complex scientific problems in a conceptually simple, scalable and effective manner.

arXiv Computer Science @arxiv_cs@qoto.org

Strategic Workforce Planning in Crowdsourced Delivery with Hybrid Driver Fleets. (arXiv:2311.17935v1 [eess.SY]) http://arxiv.org/abs/2311.17935

Strategic Workforce Planning in Crowdsourced Delivery with Hybrid Driver Fleets

Nowadays, logistics service providers (LSPs) increasingly consider using a crowdsourced workforce on the last mile to fulfill customers' expectations regarding same-day or on-demand delivery at reduced costs. The crowdsourced workforce's availability is, however, uncertain. Therefore, LSPs often hire additional fixed employees to perform deliveries when the availability of crowdsourced drivers is low. In this context, the reliability versus flexibility trade-off which LSPs face over a longer period, e.g., a year, remains unstudied. Against this background, we jointly study a workforce planning problem that considers fixed drivers (FDs) and the temporal development of the crowdsourced driver (CD) fleet over a long-term time horizon. We consider two types of CDs, gigworkers (GWs) and occasional drivers (ODs). While GWs are not sensitive to the request's destination and typically exhibit high availability, ODs only serve requests whose origin and destination coincide with their own private route's origin and destination. Moreover, to account for time horizon-specific dynamics, we consider stochastic turnover for both FDs and CDs as well as stochastic CD fleet growth. We formulate the resulting workforce planning problem as a Markov decision process (MDP) whose reward function reflects total costs, i.e., wages and operational costs arising from serving demand with FDs and CDs, and solve it via approximate dynamic programming (ADP). Applying our approach to an environment based on real-world demand data from GrubHub, we find that in fleets consisting of FDs and CDs, ADP-based hiring policies can outperform myopic hiring policies by up to 19% in total costs. In the studied setting, we observed that GWs reduce the LSP's total costs more than ODs. When we account for CDs' increased resignation probability when not being matched with enough requests, the amount of required FDs increases.

arXiv Computer Science @arxiv_cs@qoto.org

Diagnostics Algorithms in Nuclear Plant Cyber Attack Analysis Toolkit. (arXiv:2311.17936v1 [eess.SY]) http://arxiv.org/abs/2311.17936

Diagnostics Algorithms in Nuclear Plant Cyber Attack Analysis Toolkit

A Python interface is developed for the GPWR Simulator to automatically simulate cyber-spoofing of different steam generator parameters and plant operation. Specifically, steam generator water level, feedwater flowrate, steam flowrate, valve position, and steam generator controller parameters, including controller gain and time constant, can be directly attacked using command inject, denial of service, and man-in-the-middle type attacks. Plant operation can be initialized to any of the initial conditions provided by the GPWR simulator. Several different diagnostics algorithms have been implemented for anomaly detection, including physics-based diagnostics with Kalman filtering, data-driven diagnostics, noise profiling, and online sensor validation. Industry-standard safety analysis code RELAP5 is also available as a part of the toolkit. Diagnostics algorithms are analyzed based on accuracy and efficiency. Our observations indicate that physics-based diagnostics with Kalman filtering are the most robust. An experimental quantum kernel has been added to the framework for preliminary testing. Our first impressions suggest that while quantum kernels can be accurate, just like any other kernels, their applicability is problem/data dependent, and can be prone to overfitting.

arXiv Computer Science @arxiv_cs@qoto.org

Unlocking Spatial Comprehension in Text-to-Image Diffusion Models. (arXiv:2311.17937v1 [cs.CV]) http://arxiv.org/abs/2311.17937

Unlocking Spatial Comprehension in Text-to-Image Diffusion Models

We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models. Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene, such as `An image of a gray cat on the left of an orange dog', and generate corresponding images. This is especially important in order to provide more control to the user. CompFuser overcomes the limitation of existing text-to-image diffusion models by decoding the generation of multiple objects into iterative steps: first generating a single object and then editing the image by placing additional objects in their designated positions. To create training data for spatial comprehension and attribute assignment we introduce a synthetic data generation process, that leverages a frozen large language model and a frozen layout-based diffusion model for object placement. We compare our approach to strong baselines and show that our model outperforms state-of-the-art image generation models in spatial comprehension and attribute assignment, despite being 3x to 5x smaller in parameters.

arXiv Computer Science @arxiv_cs@qoto.org

Efficient Deep Speech Understanding at the Edge. (arXiv:2311.17065v1 [eess.AS]) http://arxiv.org/abs/2311.17065

Efficient Deep Speech Understanding at the Edge

Contemporary Speech Understanding (SU) involves a sophisticated pipeline: capturing real-time voice input, the pipeline encompasses a deep neural network with an encoder-decoder architecture enhanced by beam search. This network periodically assesses attention and Connectionist Temporal Classification (CTC) scores in its autoregressive output. This paper aims to enhance SU performance on edge devices with limited resources. It pursues two intertwined goals: accelerating on-device execution and efficiently handling inputs that surpass the on-device model's capacity. While these objectives are well-established, we introduce innovative solutions that specifically address SU's distinctive challenges: 1. Late contextualization: Enables the parallel execution of a model's attentive encoder during input ingestion. 2. Pilot decoding: Alleviates temporal load imbalances. 3. Autoregression offramps: Facilitate offloading decisions based on partial output sequences. Our techniques seamlessly integrate with existing SU models, pipelines, and frameworks, allowing for independent or combined application. Together, they constitute a hybrid solution for edge SU, exemplified by our prototype, XYZ. Evaluated on platforms equipped with 6-8 Arm cores, our system achieves State-of-the-Art (SOTA) accuracy, reducing end-to-end latency by 2x and halving offloading requirements.

arXiv Computer Science @arxiv_cs@qoto.org

Cluster trajectory of SOFA score in predicting mortality in sepsis. (arXiv:2311.17066v1 [q-bio.QM]) http://arxiv.org/abs/2311.17066

Cluster trajectory of SOFA score in predicting mortality in sepsis

Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.

arXiv Computer Science @arxiv_cs@qoto.org

Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling. (arXiv:2311.17068v1 [cs.CE]) http://arxiv.org/abs/2311.17068

Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling

Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT models. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature models. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational model of the cold plate is developed and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. The FEM results are transformed and scaled from unstructured to structured, image-like meshes to create training and test datasets. The DeepEDH methodology's performance is examined in relation to data scaling, training dataset size, and network depth. Our performance analysis covers the impact of the novel architecture, separate field models, output geometry masks, multi-stage temperature models, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT thermal boundary condition on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH methodology for CHT models exhibits up to a 65% enhancement in the coefficient of determination ($R^{2}$).

arXiv Computer Science @arxiv_cs@qoto.org

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers. (arXiv:2311.17072v1 [cs.CV]) http://arxiv.org/abs/2311.17072

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Generative training has been demonstrated to be powerful for building visual-language models. However, on zero-shot discriminative benchmarks, there is still a performance gap between models trained with generative and discriminative objectives. In this paper, we aim to narrow this gap by improving the efficacy of generative training on classification tasks, without any finetuning processes or additional modules. Specifically, we focus on narrowing the gap between the generative captioner and the CLIP classifier. We begin by analysing the predictions made by the captioner and classifier and observe that the caption generation inherits the distribution bias from the language model trained with pure text modality, making it less grounded on the visual signal. To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs. We further design a generative training objective to match the evaluation objective. We name our model trained and evaluated from the novel procedures as Information Gain (IG) captioner. We pretrain the models on the public Laion-5B dataset and perform a series of discriminative evaluations. For the zero-shot classification on ImageNet, IG captioner achieves $> 18\%$ improvements over the standard captioner, achieving comparable performances with the CLIP classifier. IG captioner also demonstrated strong performance on zero-shot image-text retrieval tasks on MSCOCO and Flickr30K. We hope this paper inspires further research towards unifying generative and discriminative training procedures for visual-language models.

arXiv Computer Science @arxiv_cs@qoto.org

Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks. (arXiv:2311.17073v1 [cs.LG]) http://arxiv.org/abs/2311.17073

Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks

The high simulation cost has been a bottleneck of practical analog/mixed-signal design automation. Many learning-based algorithms require thousands of simulated data points, which is impractical for expensive to simulate circuits. We propose a learning-based algorithm that can be trained using a small amount of data and, therefore, scalable to tasks with expensive simulations. Our efficient algorithm solves the post-layout performance optimization problem where simulations are known to be expensive. Our comprehensive study also solves the schematic-level sizing problem. For efficient optimization, we utilize Bayesian Neural Networks as a regression model to approximate circuit performance. For layout-aware optimization, we handle the problem as a multi-fidelity optimization problem and improve efficiency by exploiting the correlations from cheaper evaluations. We present three test cases to demonstrate the efficiency of our algorithms. Our tests prove that the proposed approach is more efficient than conventional baselines and state-of-the-art algorithms.

arXiv Computer Science @arxiv_cs@qoto.org

Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification. (arXiv:2311.17074v1 [cs.CV]) http://arxiv.org/abs/2311.17074

Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

Interactive Segmentation Models (ISMs) like the Segment Anything Model have significantly improved various computer vision tasks, yet their application to Person Re-identification (ReID) remains limited. On the other hand, existing semantic pre-training models for ReID often have limitations like predefined parsing ranges or coarse semantics. Additionally, ReID and Clothes-Changing ReID (CC-ReID) are usually treated separately due to their different domains. This paper investigates whether utilizing precise human-centric semantic representation can boost the ReID performance and improve the generalization among various ReID tasks. We propose SemReID, a self-supervised ReID model that leverages ISMs for adaptive part-based semantic extraction, contributing to the improvement of ReID performance. SemReID additionally refines its semantic representation through techniques such as image masking and KoLeo regularization. Evaluation across three types of ReID datasets -- standard ReID, CC-ReID, and unconstrained ReID -- demonstrates superior performance compared to state-of-the-art methods. In addition, recognizing the scarcity of large person datasets with fine-grained semantics, we introduce the novel LUPerson-Part dataset to assist ReID methods in acquiring the fine-grained part semantics for robust performance.

arXiv Computer Science @arxiv_cs@qoto.org

Compositional Chain-of-Thought Prompting for Large Multimodal Models. (arXiv:2311.17076v1 [cs.CV]) http://arxiv.org/abs/2311.17076

Compositional Chain-of-Thought Prompting for Large Multimodal Models

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks. However, recent research has shown that even the most advanced LMMs still struggle to capture aspects of compositional visual reasoning, such as attributes and relationships between objects. One solution is to utilize scene graphs (SGs)--a formalization of objects and their relations and attributes that has been extensively used as a bridge between the visual and textual domains. Yet, scene graph data requires scene graph annotations, which are expensive to collect and thus not easily scalable. Moreover, finetuning an LMM based on SG data can lead to catastrophic forgetting of the pretraining objective. To overcome this, inspired by chain-of-thought methods, we propose Compositional Chain-of-Thought (CCoT), a novel zero-shot Chain-of-Thought prompting method that utilizes SG representations in order to extract compositional knowledge from an LMM. Specifically, we first generate an SG using the LMM, and then use that SG in the prompt to produce a response. Through extensive experiments, we find that the proposed CCoT approach not only improves LMM performance on several vision and language VL compositional benchmarks but also improves the performance of several popular LMMs on general multimodal benchmarks, without the need for fine-tuning or annotated ground-truth SGs.

arXiv Computer Science @arxiv_cs@qoto.org

Combating the "Sameness" in AI Art: Reflections on the Interactive AI Installation Fencing Hallucination. (arXiv:2311.17080v1 [cs.CV]) http://arxiv.org/abs/2311.17080

Combating the "Sameness" in AI Art: Reflections on the Interactive AI Installation Fencing Hallucination

The article summarizes three types of "sameness" issues in Artificial Intelligence(AI) art, each occurring at different stages of development in AI image creation tools. Through the Fencing Hallucination project, the article reflects on the design of AI art production in alleviating the sense of uniformity, maintaining the uniqueness of images from an AI image synthesizer, and enhancing the connection between the artworks and the audience. This paper endeavors to stimulate the creation of distinctive AI art by recounting the efforts and insights derived from the Fencing Hallucination project, all dedicated to addressing the issue of "sameness".

arXiv Computer Science @arxiv_cs@qoto.org

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything. (arXiv:2311.17081v1 [cs.CV]) http://arxiv.org/abs/2311.17081

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

With the development of Deep Neural Networks (DNNs), many efforts have been made to handle medical image segmentation. Traditional methods such as nnUNet train specific segmentation models on the individual datasets. Plenty of recent methods have been proposed to adapt the foundational Segment Anything Model (SAM) to medical image segmentation. However, they still focus on discrete representations to generate pixel-wise predictions, which are spatially inflexible and scale poorly to higher resolution. In contrast, implicit methods learn continuous representations for segmentation, which is crucial for medical image segmentation. In this paper, we propose I-MedSAM, which leverages the benefits of both continuous representations and SAM, to obtain better cross-domain ability and accurate boundary delineation. Since medical image segmentation needs to predict detailed segmentation boundaries, we designed a novel adapter to enhance the SAM features with high-frequency information during Parameter Efficient Fine Tuning (PEFT). To convert the SAM features and coordinates into continuous segmentation output, we utilize Implicit Neural Representation (INR) to learn an implicit segmentation decoder. We also propose an uncertainty-guided sampling strategy for efficient learning of INR. Extensive evaluations on 2D medical image segmentation tasks have shown that our proposed method with only 1.6M trainable parameters outperforms existing methods including discrete and continuous methods. The code will be released.

arXiv Computer Science @arxiv_cs@qoto.org

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling. (arXiv:2311.17082v1 [cs.CV]) http://arxiv.org/abs/2311.17082

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018