Show newer

AI-ANNE: (A) (N)eural (N)et for (E)xploration: Transferring Deep Learning Models onto Microcontrollers and Embedded Systems arxiv.org/abs/2501.03256 .LG .AI

AI-ANNE: (A) (N)eural (N)et for (E)xploration: Transferring Deep Learning Models onto Microcontrollers and Embedded Systems

This working paper explores the integration of neural networks onto resource-constrained embedded systems like a Raspberry Pi Pico / Raspberry Pi Pico 2. A TinyML aproach transfers neural networks directly on these microcontrollers, enabling real-time, low-latency, and energy-efficient inference while maintaining data privacy. Therefore, AI-ANNE: (A) (N)eural (N)et for (E)xploration will be presented, which facilitates the transfer of pre-trained models from high-performance platforms like TensorFlow and Keras onto microcontrollers, using a lightweight programming language like MicroPython. This approach demonstrates how neural network architectures, such as neurons, layers, density and activation functions can be implemented in MicroPython in order to deal with the computational limitations of embedded systems. Based on the Raspberry Pi Pico / Raspberry Pi Pico 2, two different neural networks on microcontrollers are presented for an example of data classification. As an further application example, such a microcontroller can be used for condition monitoring, where immediate corrective measures are triggered on the basis of sensor data. Overall, this working paper presents a very easy-to-implement way of using neural networks on energy-efficient devices such as microcontrollers. This makes AI-ANNE: (A) (N)eural (N)et for (E)xploration not only suited for practical use, but also as an educational tool with clear insights into how neural networks operate.

arXiv.org

Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens arxiv.org/abs/2501.03259 .CL .AI .CY .LG .MA

Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens

As large language models (LLMs) like GPT-4 and Llama 3 become integral to educational contexts, concerns are mounting over the cultural biases, power imbalances, and ethical limitations embedded within these technologies. Though generative AI tools aim to enhance learning experiences, they often reflect values rooted in Western, Educated, Industrialized, Rich, and Democratic (WEIRD) cultural paradigms, potentially sidelining diverse global perspectives. This paper proposes a framework to assess and mitigate cultural bias within LLMs through the lens of applied multiplexity. Multiplexity, inspired by Senturk et al. and rooted in Islamic and other wisdom traditions, emphasizes the coexistence of diverse cultural viewpoints, supporting a multi-layered epistemology that integrates both empirical sciences and normative values. Our analysis reveals that LLMs frequently exhibit cultural polarization, with biases appearing in both overt responses and subtle contextual cues. To address inherent biases and incorporate multiplexity in LLMs, we propose two strategies: \textit{Contextually-Implemented Multiplex LLMs}, which embed multiplex principles directly into the system prompt, influencing LLM outputs at a foundational level and independent of individual prompts, and \textit{Multi-Agent System (MAS)-Implemented Multiplex LLMs}, where multiple LLM agents, each representing distinct cultural viewpoints, collaboratively generate a balanced, synthesized response. Our findings demonstrate that as mitigation strategies evolve from contextual prompting to MAS-implementation, cultural inclusivity markedly improves, evidenced by a significant rise in the Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25\% at baseline to 98\% with the MAS-Implemented Multiplex LLMs. Sentiment analysis further shows a shift towards positive sentiment across cultures,...

arXiv.org

Navigation Variable-based Multi-objective Particle Swarm Optimization for UAV Path Planning with Kinematic Constraints arxiv.org/abs/2501.03261 .RO .AI .NE

Navigation Variable-based Multi-objective Particle Swarm Optimization for UAV Path Planning with Kinematic Constraints

Path planning is essential for unmanned aerial vehicles (UAVs) as it determines the path that the UAV needs to follow to complete a task. This work addresses this problem by introducing a new algorithm called navigation variable-based multi-objective particle swarm optimization (NMOPSO). It first models path planning as an optimization problem via the definition of a set of objective functions that include optimality and safety requirements for UAV operation. The NMOPSO is then used to minimize those functions through Pareto optimal solutions. The algorithm features a new path representation based on navigation variables to include kinematic constraints and exploit the maneuverable characteristics of the UAV. It also includes an adaptive mutation mechanism to enhance the diversity of the swarm for better solutions. Comparisons with various algorithms have been carried out to benchmark the proposed approach. The results indicate that the NMOPSO performs better than not only other particle swarm optimization variants but also other state-of-the-art multi-objective and metaheuristic optimization algorithms. Experiments have also been conducted with real UAVs to confirm the validity of the approach for practical flights. The source code of the algorithm is available at https://github.com/ngandng/NMOPSO.

arXiv.org

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models arxiv.org/abs/2501.03262 .CL .LG

STEAM-EEG: Spatiotemporal EEG Analysis with Markov Transfer Fields and Attentive CNNs arxiv.org/abs/2501.01959 .CV .AI .CE

GAF-FusionNet: Multimodal ECG Analysis via Gramian Angular Fields and Split Attention arxiv.org/abs/2501.01960 .CV .AI .GR .LG

Statistical learning does not always entail knowledge arxiv.org/abs/2501.01963 .IT .PR .ST .ML .TH .LG .AI .IT

Optimal bounds for dissatisfaction in perpetual voting arxiv.org/abs/2501.01969 .GT .AI .LG

INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models arxiv.org/abs/2501.01973 .CV .AI .CY

Hawkes based Representation Learning for Reasoning over Scale-free Community-structured Temporal Knowledge Graphs arxiv.org/abs/2501.01974 .SI .AI .LG

Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions arxiv.org/abs/2501.01981 .IV .CV

Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions

This research paper delves into the development of an Optical Character Recognition (OCR) system for the recognition of Ashokan Brahmi characters using Convolutional Neural Networks. It utilizes a comprehensive dataset of character images to train the models, along with data augmentation techniques to optimize the training process. Furthermore, the paper incorporates image preprocessing to remove noise, as well as image segmentation to facilitate line and character segmentation. The study mainly focuses on three pre-trained CNNs, namely LeNet, VGG-16, and MobileNet and compares their accuracy. Transfer learning was employed to adapt the pre-trained models to the Ashokan Brahmi character dataset. The findings reveal that MobileNet outperforms the other two models in terms of accuracy, achieving a validation accuracy of 95.94% and validation loss of 0.129. The paper provides an in-depth analysis of the implementation process using MobileNet and discusses the implications of the findings. The use of OCR for character recognition is of significant importance in the field of epigraphy, specifically for the preservation and digitization of ancient scripts. The results of this research paper demonstrate the effectiveness of using pre-trained CNNs for the recognition of Ashokan Brahmi characters.

arXiv.org

Is Your Image a Good Storyteller? arxiv.org/abs/2501.01982 .CV .AI .CL

Is Your Image a Good Storyteller?

Quantifying image complexity at the entity level is straightforward, but the assessment of semantic complexity has been largely overlooked. In fact, there are differences in semantic complexity across images. Images with richer semantics can tell vivid and engaging stories and offer a wide range of application scenarios. For example, the Cookie Theft picture is such a kind of image and is widely used to assess human language and cognitive abilities due to its higher semantic complexity. Additionally, semantically rich images can benefit the development of vision models, as images with limited semantics are becoming less challenging for them. However, such images are scarce, highlighting the need for a greater number of them. For instance, there is a need for more images like Cookie Theft to cater to people from different cultural backgrounds and eras. Assessing semantic complexity requires human experts and empirical evidence. Automatic evaluation of how semantically rich an image will be the first step of mining or generating more images with rich semantics, and benefit human cognitive assessment, Artificial Intelligence, and various other applications. In response, we propose the Image Semantic Assessment (ISA) task to address this problem. We introduce the first ISA dataset and a novel method that leverages language to solve this vision problem. Experiments on our dataset demonstrate the effectiveness of our approach. Our data and code are available at: https://github.com/xiujiesong/ISA.

arXiv.org

Item Association Factorization Mixed Markov Chains for Sequential Recommendation arxiv.org/abs/2501.01429 .IR

TERA: A Simulation Environment for Terrain Excavation Robot Autonomy arxiv.org/abs/2501.01430 .RO

TERA: A Simulation Environment for Terrain Excavation Robot Autonomy

Developing excavation autonomy is challenging given the environments where excavators operate, the complexity of physical interaction and the degrees of freedom of operation of the excavator itself. Simulation is a useful tool to build parts of the autonomy without the complexity of experimentation. Traditional excavator simulators are geared towards high fidelity interactions between the joints or between the terrain but do not incorporate other challenges such as perception required for end to end autonomy. A complete simulator should be capable of supporting real time operation while providing high fidelity simulation of the excavator(s), the environment, and their interaction. In this paper we present TERA (Terrain Excavation Robot Autonomy), a simulator geared towards autonomous excavator applications based on Unity3D and AGX that provides the extensibility and scalability required to study full autonomy. It provides the ability to configure the excavator and the environment per the user requirements. We also demonstrate realistic dynamics by incorporating a time-varying model that introduces variations in the system's responses. The simulator is then evaluated with different scenarios such as track deformation, velocities on different terrains, similarity of the system with the real excavator and the overall path error to show the capabilities of the simulation.

arXiv.org

Mathematical Definition and Systematization of Puzzle Rules arxiv.org/abs/2501.01433 .HO .AI

Mathematical Definition and Systematization of Puzzle Rules

While logic puzzles have engaged individuals through problem-solving and critical thinking, the creation of new puzzle rules has largely relied on ad-hoc processes. Pencil puzzles, such as Slitherlink and Sudoku, represent a prominent subset of these games, celebrated for their intellectual challenges rooted in combinatorial logic and spatial reasoning. Despite extensive research into solving techniques and automated problem generation, a unified framework for systematic and scalable rule design has been lacking. Here, we introduce a mathematical framework for defining and systematizing pencil puzzle rules. This framework formalizes grid elements, their positional relationships, and iterative composition operations, allowing for the incremental construction of structures that form the basis of puzzle rules. Furthermore, we establish a formal method to describe constraints and domains for each structure, ensuring solvability and coherence. Applying this framework, we successfully formalized the rules of well-known Nikoli puzzles, including Slitherlink and Sudoku, demonstrating the formal representation of a significant portion (approximately one-fourth) of existing puzzles. These results validate the potential of the framework to systematize and innovate puzzle rule design, establishing a pathway to automated rule generation. By providing a mathematical foundation for puzzle rule creation, this framework opens avenues for computers, potentially enhanced by AI, to design novel puzzle rules tailored to player preferences, expanding the scope of puzzle diversity. Beyond its direct application to pencil puzzles, this work illustrates how mathematical frameworks can bridge recreational mathematics and algorithmic design, offering tools for broader exploration in logic-based systems, with potential applications in educational game design, personalized learning, and computational creativity.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.