Show newer

Toward a Human-Centered AI-assisted Colonoscopy System in Australia arxiv.org/abs/2503.20790 .HC .AI

Semantic Web -- A Forgotten Wave of Artificial Intelligence? arxiv.org/abs/2503.20793 .SI .CY

Semantic Web -- A Forgotten Wave of Artificial Intelligence?

The history of Artificial Intelligence is a narrative of waves - rising optimism followed by crashing disappointments. AI winters, such as the early 2000s, are often remembered as barren periods of innovation. This paper argues that such a perspective overlooks a crucial wave of AI that seems to be forgotten: the rise of the Semantic Web, which is based on knowledge representation, logic, and reasoning, and its interplay with intelligent Software Agents. Fast forward to today, and ChatGPT has reignited AI enthusiasm, built on deep learning and advanced neural models. However, before Large Language Models dominated the conversation, another ambitious vision emerged - one where AI-driven Software Agents autonomously served Web users based on a structured, machine-interpretable Web. The Semantic Web aimed to transform the World Wide Web into an ecosystem where AI could reason, understand, and act. Between 2000 and 2010, this vision sparked a significant research boom, only to fade into obscurity as AI's mainstream narrative shifted elsewhere. Today, as LLMs edge toward autonomous execution, we revisit this overlooked wave. By analyzing its academic impact through bibliometric data, we highlight the Semantic Web's role in AI history and its untapped potential for modern Software Agent development. Recognizing this forgotten chapter not only deepens our understanding of AI's cyclical evolution but also offers key insights for integrating emerging technologies.

arXiv.org

Can Zero-Shot Commercial APIs Deliver Regulatory-Grade Clinical Text DeIdentification? arxiv.org/abs/2503.20794 .CL .CR .IR .LG

Can Zero-Shot Commercial APIs Deliver Regulatory-Grade Clinical Text DeIdentification?

We systematically assess the performance of three leading API-based de-identification systems - Azure Health Data Services, AWS Comprehend Medical, and OpenAI GPT-4o - against our de-identification systems on a ground truth dataset of 48 clinical documents annotated by medical experts. Our analysis, conducted at both entity-level and token-level, demonstrates that our solution, Healthcare NLP, achieves the highest accuracy, with a 96% F1-score in protected health information (PHI) detection, significantly outperforming Azure (91%), AWS (83%), and GPT-4o (79%). Beyond accuracy, Healthcare NLP is also the most cost-effective solution, reducing processing costs by over 80% compared to Azure and GPT-4o. Its fixed-cost local deployment model avoids the escalating per-request fees of cloud-based services, making it a scalable and economical choice. Our results underscore a critical limitation: zero-shot commercial APIs fail to meet the accuracy, adaptability, and cost-efficiency required for regulatory-grade clinical de-identification. Healthcare NLP's superior performance, customization capabilities, and economic advantages position it as the more viable solution for healthcare organizations seeking compliance and scalability in clinical NLP workflows.

arXiv.org

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability arxiv.org/abs/2503.20796 .CR .AI

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability

Sophisticated phishing attacks have emerged as a major cybersecurity threat, becoming more common and difficult to prevent. Though machine learning techniques have shown promise in detecting phishing attacks, they function mainly as "black boxes" without revealing their decision-making rationale. This lack of transparency erodes the trust of users and diminishes their effective threat response. We present EXPLICATE: a framework that enhances phishing detection through a three-component architecture: an ML-based classifier using domain-specific features, a dual-explanation layer combining LIME and SHAP for complementary feature-level insights, and an LLM enhancement using DeepSeek v3 to translate technical explanations into accessible natural language. Our experiments show that EXPLICATE attains 98.4 % accuracy on all metrics, which is on par with existing deep learning techniques but has better explainability. High-quality explanations are generated by the framework with an accuracy of 94.2 % as well as a consistency of 96.8\% between the LLM output and model prediction. We create EXPLICATE as a fully usable GUI application and a light Chrome extension, showing its applicability in many deployment situations. The research shows that high detection performance can go hand-in-hand with meaningful explainability in security applications. Most important, it addresses the critical divide between automated AI and user trust in phishing detection systems.

arXiv.org

Payload-Aware Intrusion Detection with CMAE and Large Language Models arxiv.org/abs/2503.20798 .CR .AI

Payload-Aware Intrusion Detection with CMAE and Large Language Models

Intrusion Detection Systems (IDS) are crucial for identifying malicious traffic, yet traditional signature-based methods struggle with zero-day attacks and high false positive rates. AI-driven packet-capture analysis offers a promising alternative. However, existing approaches rely heavily on flow-based or statistical features, limiting their ability to detect fine-grained attack patterns. This study proposes Xavier-CMAE, an enhanced Convolutional Multi-Head Attention Ensemble (CMAE) model that improves detection accuracy while reducing computational overhead. By replacing Word2Vec embeddings with a Hex2Int tokenizer and Xavier initialization, Xavier-CMAE eliminates pre-training, accelerates training, and achieves 99.971% accuracy with a 0.018% false positive rate, outperforming Word2Vec-based methods. Additionally, we introduce LLM-CMAE, which integrates pre-trained Large Language Model (LLM) tokenizers into CMAE. While LLMs enhance feature extraction, their computational cost hinders real-time detection. LLM-CMAE balances efficiency and performance, reaching 99.969% accuracy with a 0.019% false positive rate. This work advances AI-powered IDS by (1) introducing a payload-based detection framework, (2) enhancing efficiency with Xavier-CMAE, and (3) integrating LLM tokenizers for improved real-time detection.

arXiv.org

Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes arxiv.org/abs/2503.20800 .CR .AI .LG

Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes

In light of scaling laws, many AI institutions are intensifying efforts to construct advanced AIs on extensive collections of high-quality human data. However, in a rush to stay competitive, some institutions may inadvertently or even deliberately include unauthorized data (like privacy- or intellectual property-sensitive content) for AI training, which infringes on the rights of data owners. Compounding this issue, these advanced AI services are typically built on opaque cloud platforms, which restricts access to internal information during AI training and inference, leaving only the generated outputs available for forensics. Thus, despite the introduction of legal frameworks by various countries to safeguard data rights, uncovering evidence of data misuse in modern opaque AI applications remains a significant challenge. In this paper, inspired by the ability of isotopes to trace elements within chemical reactions, we introduce the concept of information isotopes and elucidate their properties in tracing training data within opaque AI systems. Furthermore, we propose an information isotope tracing method designed to identify and provide evidence of unauthorized data usage by detecting the presence of target information isotopes in AI generations. We conduct experiments on ten AI models (including GPT-4o, Claude-3.5, and DeepSeek) and four benchmark datasets in critical domains (medical data, copyrighted books, and news). Results show that our method can distinguish training datasets from non-training datasets with 99\% accuracy and significant evidence (p-value$<0.001$) by examining a data entry equivalent in length to a research paper. The findings show the potential of our work as an inclusive tool for empowering individuals, including those without expertise in AI, to safeguard their data rights in the rapidly evolving era of AI advancements and applications.

arXiv.org

SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment arxiv.org/abs/2503.20801 .CL

SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment

Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs (KGs) and is widely used in graph fusion-related fields. However, as the scale of KGs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KGs, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KGs structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion.

arXiv.org

A Study on the Matching Rate of Dance Movements Using 2D Skeleton Detection and 3D Pose Estimation: Why Is SEVENTEEN's Performance So Bita-Zoroi (Perfectly Synchronized)? arxiv.org/abs/2503.19917 .CV

A Study on the Matching Rate of Dance Movements Using 2D Skeleton Detection and 3D Pose Estimation: Why Is SEVENTEEN's Performance So Bita-Zoroi (Perfectly Synchronized)?

SEVENTEEN is a K-pop group with a large number of members 13 in total and the significant physical disparity between the tallest and shortest members among K-pop groups. However, despite their large numbers and physical differences, their dance performances exhibit unparalleled unity in the K-pop industry. According to one theory, their dance synchronization rate is said to be 90% or even 97%. However, there is little concrete data to substantiate this synchronization rate. In this study, we analyzed SEVENTEEN's dance performances using videos available on YouTube. We applied 2D skeleton detection and 3D pose estimation to evaluate joint angles, body part movements, and jumping and crouching motions to investigate the factors contributing to their performance unity. The analysis revealed exceptionally high consistency in the movement direction of body parts, as well as in the ankle and head positions during jumping movements and the head position during crouching movements. These findings suggested that SEVENTEEN's high synchronization rate can be attributed to the consistency of movement direction and the synchronization of ankle and head heights during jumping and crouching movements.

arXiv.org

Unifying Structural Proximity and Equivalence for Enhanced Dynamic Network Embedding arxiv.org/abs/2503.19926 .SI .LG

Unifying Structural Proximity and Equivalence for Enhanced Dynamic Network Embedding

Dynamic network embedding methods transform nodes in a dynamic network into low-dimensional vectors while preserving network characteristics, facilitating tasks such as node classification and community detection. Several embedding methods have been proposed to capture structural proximity among nodes in a network, where densely connected communities are preserved, while others have been proposed to preserve structural equivalence among nodes, capturing their structural roles regardless of their relative distance in the network. However, most existing methods that aim to preserve both network characteristics mainly focus on static networks and those designed for dynamic networks do not explicitly account for inter-snapshot structural properties. This paper proposes a novel unifying dynamic network embedding method that simultaneously preserves both structural proximity and equivalence while considering inter-snapshot structural relationships in a dynamic network. Specifically, to define structural equivalence in a dynamic network, we use temporal subgraphs, known as dynamic graphlets, to capture how a node's neighborhood structure evolves over time. We then introduce a temporal-structural random walk to flexibly sample time-respecting sequences of nodes, considering both their temporal proximity and similarity in evolving structures. The proposed method is evaluated using five real-world networks on node classification where it outperforms benchmark methods, showing its effectiveness and flexibility in capturing various aspects of a network.

arXiv.org

Unlocking Health Insights with SDoH Data: A Comprehensive Open-Access Database and SDoH-EHR Linkage Tool arxiv.org/abs/2503.19928 .SI

Unlocking Health Insights with SDoH Data: A Comprehensive Open-Access Database and SDoH-EHR Linkage Tool

Background: Social determinants of health (SDoH) play a crucial role in influencing health outcomes, accounting for nearly 50% of modifiable health factors and bringing to light critical disparities among disadvantaged groups. Despite the significant impact of SDoH, existing data resources often fall short in terms of comprehensiveness, integration, and usability. Methods: To address these gaps, we developed an extensive Exposome database and a corresponding web application, aimed at enhancing data usability and integration with electronic health record (EHR) to foster personalized and informed healthcare. We created a robust database consisting of a wide array of SDoH indicators and an automated linkage tool designed to facilitate effortless integration with EHR. We emphasized a user-friendly interface to cater to researchers, clinicians, and public health professionals. Results: The resultant Exposome database and web application offer an extensive data catalog with enhanced usability features. The automated linkage tool has demonstrated efficiency in integrating SDoH data with EHRs, significantly improving data accessibility. Initial deployment has confirmed scalability and robust spatial data relationships, facilitating precise and contextually relevant healthcare insights. Conclusion: The development of an advanced Exposome database and linkage tool marks a significant step toward enhancing the accessibility and usability of SDoH data. By centralizing and integrating comprehensive SDoH indicators with EHRs, this tool empowers a wide range of users to access high-quality, standardized data. This resource will have a lasting impact on personalized healthcare and equitable health landscape.

arXiv.org

Robust Object Detection of Underwater Robot based on Domain Generalization arxiv.org/abs/2503.19929 .CV .LG

Robust Object Detection of Underwater Robot based on Domain Generalization

Object detection aims to obtain the location and the category of specific objects in a given image, which includes two tasks: classification and location. In recent years, researchers tend to apply object detection to underwater robots equipped with vision systems to complete tasks including seafood fishing, fish farming, biodiversity monitoring and so on. However, the diversity and complexity of underwater environments bring new challenges to object detection. First, aquatic organisms tend to live together, which leads to severe occlusion. Second, theaquatic organisms are good at hiding themselves, which have a similar color to the background. Third, the various water quality and changeable and extreme lighting conditions lead to the distorted, low contrast, blue or green images obtained by the underwater camera, resulting in domain shift. And the deep model is generally vulnerable to facing domain shift. Fourth, the movement of the underwater robot leads to the blur of the captured image and makes the water muddy, which results in low visibility of the water. This paper investigates the problems brought by the underwater environment mentioned above, and aims to design a high-performance and robust underwater object detector.

arXiv.org

Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation arxiv.org/abs/2503.19937 .CV .AI

Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and generate impressive new images. To address this challenge, we propose a method known as automatic reverse prompt optimization (ARPO). Specifically, our method refines an initial prompt into a high-quality prompt through an iteratively imitative gradient prompt optimization process: 1) generating a recreated image from the current prompt to instantiate its guidance capability; 2) producing textual gradients, which are candidate prompts intended to reduce the difference between the recreated image and the reference image; 3) updating the current prompt with textual gradients using a greedy search method to maximize the CLIP similarity between prompt and reference image. We compare ARPO with several baseline methods, including handcrafted techniques, gradient-based prompt tuning methods, image captioning, and data-driven selection method. Both quantitative and qualitative results demonstrate that our ARPO converges quickly to generate high-quality reverse prompts. More importantly, we can easily create novel images with diverse styles and content by directly editing these reverse prompts. Code will be made publicly available.

arXiv.org

Continual Learning With Quasi-Newton Methods arxiv.org/abs/2503.19939 .IV .LG

Continual Learning With Quasi-Newton Methods

Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. CSQN captures parameter interactions beyond the diagonal without requiring architecture-specific modifications, making it applicable across diverse tasks and architectures. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50 percent and improves its performance by 8 percent on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.