Show newer

OpenRANet: Neuralized Spectrum Access by Joint Subcarrier and Power Allocation with Optimization-based Deep Learning arxiv.org/abs/2409.12964 .IT .IT .AI

OpenRANet: Neuralized Spectrum Access by Joint Subcarrier and Power Allocation with Optimization-based Deep Learning

The next-generation radio access network (RAN), known as Open RAN, is poised to feature an AI-native interface for wireless cellular networks, including emerging satellite-terrestrial systems, making deep learning integral to its operation. In this paper, we address the nonconvex optimization challenge of joint subcarrier and power allocation in Open RAN, with the objective of minimizing the total power consumption while ensuring users meet their transmission data rate requirements. We propose OpenRANet, an optimization-based deep learning model that integrates machine-learning techniques with iterative optimization algorithms. We start by transforming the original nonconvex problem into convex subproblems through decoupling, variable transformation, and relaxation techniques. These subproblems are then efficiently solved using iterative methods within the standard interference function framework, enabling the derivation of primal-dual solutions. These solutions integrate seamlessly as a convex optimization layer within OpenRANet, enhancing constraint adherence, solution accuracy, and computational efficiency by combining machine learning with convex analysis, as shown in numerical experiments. OpenRANet also serves as a foundation for designing resource-constrained AI-native wireless optimization strategies for broader scenarios like multi-cell systems, satellite-terrestrial networks, and future Open RAN deployments with complex power consumption requirements.

arxiv.org

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment arxiv.org/abs/2409.12965 -mat.dis-nn .app-ph .optics .ET .LG

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We perform optical training of one of the most recent deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on both language and vision tasks. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

arxiv.org

An Efficient General-Purpose Optical Accelerator for Neural Networks arxiv.org/abs/2409.12966 .SY .NE .SY

An Efficient General-Purpose Optical Accelerator for Neural Networks

General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs) due to their low latency and energy consumption. Such an accelerator is usually composed of a given number of interleaving Mach-Zehnder- Interferometers (MZIs). This interleaving architecture, however, has a low efficiency when accelerating neural networks of various sizes due to the mismatch between weight matrices and the GOA architecture. In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA. In this architecture, independent MZI modules are connected with microring resonators (MRRs), so that they can be combined to process large neural networks efficiently. Each of these modules implements a unitary matrix with inputs adjusted by tunable coefficients. The parameters of the proposed architecture are searched using genetic algorithm. To enhance the accuracy of neural networks, selected weight matrices are expanded to multiple unitary matrices applying singular value decomposition (SVD). The kernels in neural networks are also adjusted to use up the on-chip computational resources. Experimental results show that with a given number of MZIs, the mapping efficiency of neural networks on the proposed architecture can be enhanced by 21.87%, 21.20%, 24.69%, and 25.52% for VGG16 and Resnet18 on datasets Cifar10 and Cifar100, respectively. The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.

arxiv.org

How Consistent Are Humans When Grading Programming Assignments? arxiv.org/abs/2409.12967 .CY

How Consistent Are Humans When Grading Programming Assignments?

Providing consistent summative assessment to students is important, as the grades they are awarded affect their progression through university and future career prospects. While small cohorts are typically assessed by a single assessor, such as the class leader, larger cohorts are often assessed by multiple assessors, which increases the risk of inconsistent grading. To investigate the consistency of human grading of programming assignments, we asked 28 participants to each grade 40 CS1 introductory Java assignments, providing grades and feedback for correctness, code elegance, readability and documentation; the 40 assignments were split into two batches of 20. In the second batch of 20, we duplicated one assignment from the first to analyse the internal consistency of individual assessors. We measured the inter-rater reliability of the groups using Krippendorf's $α$ -- an $α> 0.667$ is recommended to make tentative conclusions based on the rating. Our groups were inconsistent, with an average $α= 0.2$ when grading correctness and an average $α< 0.1$ for code elegance, readability and documentation. To measure the individual consistency of graders, we measured the distance between the grades they awarded for the duplicated assignment in batch one and batch two. Only one participant of the 22 who didn't notice that the assignment was a duplicate was awarded the same grade for correctness, code elegance, readability and documentation. The average grade difference was 1.79 for correctness and less than 1.6 for code elegance, readability and documentation. Our results show that human graders in our study can not agree on the grade to give a piece of student work and are often individually inconsistent, suggesting that the idea of a ``gold standard'' of human grading might be flawed, and highlights that a shared rubric alone is not enough to ensure consistency.

arxiv.org

MITHOS: Interactive Mixed Reality Training to Support Professional Socio-Emotional Interactions at Schools arxiv.org/abs/2409.12968 .HC .AI

MITHOS: Interactive Mixed Reality Training to Support Professional Socio-Emotional Interactions at Schools

Teachers in challenging conflict situations often experience shame and self-blame, which relate to the feeling of incompetence but may externalise as anger. Sensing mixed signals fails the contingency rule for developing affect regulation and may result in confusion for students about their own emotions and hinder their emotion regulation. Therefore, being able to constructively regulate emotions not only benefits individual experience of emotions but also fosters effective interpersonal emotion regulation and influences how a situation is managed. MITHOS is a system aimed at training teachers' conflict resolution skills through realistic situative learning opportunities during classroom conflicts. In four stages, MITHOS supports teachers' socio-emotional self-awareness, perspective-taking and positive regard. It provides: a) a safe virtual environment to train free social interaction and receive natural social feedback from reciprocal student-agent reactions, b) spatial situational perspective taking through an avatar, c) individual virtual reflection guidance on emotional experiences through co-regulation processes, and d) expert feedback on professional behavioural strategies. This chapter presents the four stages and their implementation in a semi-automatic Wizard-of-Oz (WoZ) System. The WoZ system affords collecting data that are used for developing the fully automated hybrid (machine learning and model-based) system, and to validate the underlying psychological and conflict resolution models. We present results validating the approach in terms of scenario realism, as well as a systematic testing of the effects of external avatar similarity on antecedents of self-awareness with behavior similarity. The chapter contributes to a common methodology of conducting interdisciplinary research for human-centered and generalisable XR and presents a system designed to support it.

arxiv.org

Reducing transmission expansion by co-optimizing sizing of wind, solar, storage and grid connection capacity arxiv.org/abs/2409.12971 .SY .SY

Reducing transmission expansion by co-optimizing sizing of wind, solar, storage and grid connection capacity

Expanding transmission capacity is likely a bottleneck that will restrict variable renewable energy (VRE) deployment required to achieve ambitious emission reduction goals. Interconnection and inter-zonal transmission buildout may be displaced by the optimal sizing of VRE to grid connection capacity and by the co-location of VRE and battery resources behind interconnection. However, neither of these capabilities is commonly captured in macro-energy system models. We develop two new functionalities to explore the substitutability of storage for transmission and the optimal capacity and siting decisions of renewable energy and battery resources through 2030 in the Western Interconnection of the United States. Our findings indicate that modeling optimized interconnection and storage co-location better captures the full value of energy storage and its ability to substitute for transmission. Optimizing interconnection capacity and co-location can reduce total grid connection and shorter-distance transmission capacity expansion on the order of 10% at storage penetration equivalent to 2.5-10% of peak system demand. The decline in interconnection capacity corresponds with greater ratios of VRE to grid connection capacity (an average of 1.5-1.6 megawatt (MW) PV:1 MW inverter capacity, 1.2-1.3 MW wind:1 MW interconnection). Co-locating storage with VREs also results in a 10-15% increase in wind capacity, as wind sites tend to require longer and more costly interconnection. Finally, co-located storage exhibits higher value than standalone storage in our model setup (22-25%). Given the coarse representation of transmission networks in our modeling, this outcome likely overstates the real-world importance of storage co-location with VREs. However, it highlights how siting storage in grid-constrained locations can maximize the value of storage and reduce transmission expansion.

arxiv.org

TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences arxiv.org/abs/2409.12972 .IR .AI .LG

TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences

For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformer-based approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences

arxiv.org

The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology arxiv.org/abs/2409.12973 .CV .AI

The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology

Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models, showing potential to revolutionize the entire process of medical imaging. However, comprehensive reviews on their development status and future challenges are currently lacking. This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications by following PCC guidelines. A systematic search was conducted across four databases: PubMed, EMbase, IEEE-Xplore, and Google Scholar, and 15 studies meeting the inclusion/exclusion criteria set by the researchers were reviewed. Most of these studies focused on improving the efficiency of report generation in specific parts of the interpretation process or on translating reports to aid patient understanding, with the latest studies extending to AI applications performing direct interpretations. All studies were quantitatively evaluated by clinicians, with most utilizing LLMs and only three employing multi-modal models. Both LLMs and multi-modal models showed excellent results in specific areas, but none yet outperformed radiologists in diagnostic performance. Most studies utilized GPT, with few using models specialized for the medical imaging domain. This study provides insights into the current state and limitations of large-scale generative AI-based applications in the medical imaging field, offering foundational data and suggesting that the era of medical imaging foundation models is on the horizon, which may fundamentally transform clinical practice in the near future.

arxiv.org

Using Agile Story Points and Game Theory Together: Better Software Planning and Development in Agile Software Development arxiv.org/abs/2409.12196 .SE

Using Agile Story Points and Game Theory Together: Better Software Planning and Development in Agile Software Development

In the realm of Agile software development, precise user story point estimation is crucial for effectual project timeline and resource management. Despite its significance, the method is often marred by issues stemming from cognitive biases, disparities in individual judgment, and hurdles related to both collaboration and competition. In addressing these challenges, this study employs a comprehensive literature review, integrating key concepts from Agile software development, Story Point estimation, and Game Theory. Through rigorous examination of existing literature and relevant case studies, we identified pervasive issues in Agile and Story Point estimation. In response, we proposed the application of game theoretic strategies, notably the Vickrey Auction and Stag Hunt Game, aiming to refine these estimations. The resultant methodology not only promotes the use of game-theory inspired mechanisms but also accentuates their potential to enhance software development planning, team cohesion, and conflict resolution. Preliminary results from our research underscore the transformative potential of these games when incorporated into Agile methodologies, especially during planning and retrospective phases. The overarching goal is to achieve improved accuracy in planning, foster team collaboration, and a discernible uplift in software product quality.

arxiv.org

Nteasee: A mixed methods study of expert and general population perspectives on deploying AI for health in African countries arxiv.org/abs/2409.12197 .LG .AI .CY

Nteasee: A mixed methods study of expert and general population perspectives on deploying AI for health in African countries

Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when deploying AI for health in African countries, as well as explore opportunities where artificial intelligence could make a positive impact in health. We used a mixed methods approach combining in-depth interviews (IDIs) and surveys. We conduct 1.5-2 hour long IDIs with 50 experts in health, policy, and AI across 17 countries, and through an inductive approach we conduct a qualitative thematic analysis on expert IDI responses. We administer a blinded 30-minute survey with case studies to 672 general population participants across 5 countries in Africa and analyze responses on quantitative scales, statistically comparing responses by country, age, gender, and level of familiarity with AI. We thematically summarize open-ended responses from surveys. Our results find generally positive attitudes, high levels of trust, accompanied by moderate levels of concern among general population participants for AI usage for health in Africa. This contrasts with expert responses, where major themes revolved around trust/mistrust, ethical concerns, and systemic barriers to integration, among others. This work presents the first-of-its-kind qualitative research study of the potential of AI for health in Africa from an algorithmic fairness angle, with perspectives from both experts and the general population. We hope that this work guides policymakers and drives home the need for further research and the inclusion of general population perspectives in decision-making around AI usage.

arxiv.org

Fault Tolerant Metric Dimensions of Leafless Cacti Graphs with Application in Supply Chain Management arxiv.org/abs/2409.12199 .CO .DM

Fault Tolerant Metric Dimensions of Leafless Cacti Graphs with Application in Supply Chain Management

A resolving set for a simple graph $G$ is a subset of vertex set of $G$ such that it distinguishes all vertices of $G$ using the shortest distance from this subset. This subset is a metric basis if it is the smallest set with this property. A resolving set is a fault tolerant resolving set if the removal of any vertex from the subset still leaves it a resolving set. The smallest set satisfying this property is the fault tolerant metric basis, and the cardinality of this set is termed as fault tolerant metric dimension of $G$, denoted by $β'(G)$. In this article, we determine the fault tolerant metric dimension of bicyclic graphs of type-I and II and show that it is always $4$ for both types of graphs. We then use these results to form our basis to consider leafless cacti graphs, and calculate their fault tolerant metric dimensions in terms of \textit{inner cycles} and \textit{outer cycles}. We then consider a detailed real world example of supply and distribution center management, and discuss the application of fault tolerant metric dimension in such a scenario. We also briefly discuss some other scenarios where leafless cacti graphs can be used to model real world problems.

arxiv.org

ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video arxiv.org/abs/2409.12202 .CV

ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video

Perceiving and understanding 3D motion is a core technology in fields such as autonomous driving, robots, and motion prediction. This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize. With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID). Most existing methods directly regress MID from two RGB frames or optical flow, resulting in inaccurate and unstable results. Our key insight is cross-scale matching, which extracts deep motion clues by matching objects in pairs of images at different scales. Unlike previous methods, ScaleFlow++ integrates optical flow and MID estimation into a unified architecture, estimating optical flow and MID end-to-end based on feature matching. Moreover, we also proposed modules such as global initialization network, global iterative optimizer, and hybrid training pipeline to integrate global motion information, reduce the number of iterations, and prevent overfitting during training. On KITTI, ScaleFlow++ achieved the best monocular scene flow estimation performance, reducing SF-all from 6.21 to 5.79. The evaluation of MID even surpasses RGBD-based methods. In addition, ScaleFlow++ has achieved stunning zero-shot generalization performance in both rigid and nonrigid scenes. Code is available at \url{https://github.com/HanLingsgjk/CSCV}.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.