arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

HistogramTools for Efficient Data Analysis and Distribution Representation in Large Data Sets https://arxiv.org/abs/2504.00001 #cs.DB #cs.PF

HistogramTools for Efficient Data Analysis and Distribution Representation in Large Data Sets

Histograms provide a powerful means of summarizing large data sets by representing their distribution in a compact, binned form. The HistogramTools R package enhances R built-in histogram functionality, offering advanced methods for manipulating and analyzing histograms, especially in large-scale data environments. Key features include the ability to serialize histograms using Protocol Buffers for distributed computing tasks, tools for merging and modifying histograms, and techniques for measuring and visualizing information loss in histogram representations. The package is particularly suited for environments utilizing MapReduce, where efficient storage and data sharing are critical. This paper presents various methods of histogram bin manipulation, distance measures, quantile approximation, and error estimation in cumulative distribution functions (CDFs) derived from histograms. Visualization techniques and efficient storage representations are also discussed alongside applications for large data processing and distributed computing tasks.

arXiv Computer Science @arxiv_cs@qoto.org

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices https://arxiv.org/abs/2504.00002 #cs.PF #cs.AI #cs.HC #cs.NI

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices

Recent advancements in large language models (LLMs) have prompted interest in deploying these models on mobile devices to enable new applications without relying on cloud connectivity. However, the efficiency constraints of deploying LLMs on resource-limited devices present significant challenges. In this paper, we conduct a comprehensive measurement study to evaluate the efficiency tradeoffs between mobile-based, edge-based, and cloud-based deployments for LLM applications. We implement AutoLife-Lite, a simplified LLM-based application that analyzes smartphone sensor data to infer user location and activity contexts. Our experiments reveal that: (1) Only small-size LLMs (<4B parameters) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models; (2) Model compression is effective in lower the hardware requirement, but may lead to significant performance degradation; (3) The latency to run LLMs on mobile devices with meaningful output is significant (>30 seconds), while cloud services demonstrate better time efficiency (<10 seconds); (4) Edge deployments offer intermediate tradeoffs between latency and model capabilities, with different results on CPU-based and GPU-based settings. These findings provide valuable insights for system designers on the current limitations and future directions for on-device LLM applications.

arXiv Computer Science @arxiv_cs@qoto.org

Tensor Generalized Approximate Message Passing https://arxiv.org/abs/2504.00008 #math.IT #cs.LG #cs.AI #cs.IT

Tensor Generalized Approximate Message Passing

We propose a tensor generalized approximate message passing (TeG-AMP) algorithm for low-rank tensor inference, which can be used to solve tensor completion and decomposition problems. We derive TeG-AMP algorithm as an approximation of the sum-product belief propagation algorithm in high dimensions where the central limit theorem and Taylor series approximations are applicable. As TeG-AMP is developed based on a general TR decomposition model, it can be directly applied to many low-rank tensor types. Moreover, our TeG-AMP can be simplified based on the CP decomposition model and a tensor simplified AMP is proposed for low CP-rank tensor inference problems. Experimental results demonstrate that the proposed methods significantly improve recovery performances since it takes full advantage of tensor structures.

arXiv Computer Science @arxiv_cs@qoto.org

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration https://arxiv.org/abs/2504.00010 #cs.LG #cs.GR #cs.MA

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Text-to-image generation (T2I) has become a key area of research with broad applications. However, existing methods often struggle with complex spatial relationships and fine-grained control over multiple concepts. Many existing approaches require significant architectural modifications, extensive training, or expert-level prompt engineering. To address these challenges, we introduce \textbf{LayerCraft}, an automated framework that leverages large language models (LLMs) as autonomous agents for structured procedural generation. LayerCraft enables users to customize objects within an image and supports narrative-driven creation with minimal effort. At its core, the system includes a coordinator agent that directs the process, along with two specialized agents: \textbf{ChainArchitect}, which employs chain-of-thought (CoT) reasoning to generate a dependency-aware 3D layout for precise instance-level control, and the \textbf{Object-Integration Network (OIN)}, which utilizes LoRA fine-tuning on pre-trained T2I models to seamlessly blend objects into specified regions of an image based on textual prompts without requiring architectural changes. Extensive evaluations demonstrate LayerCraft's versatility in applications ranging from multi-concept customization to storytelling. By providing non-experts with intuitive, precise control over T2I generation, our framework democratizes creative image creation. Our code will be released upon acceptance at github.com/PeterYYZhang/LayerCraft

arXiv Computer Science @arxiv_cs@qoto.org

I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk https://arxiv.org/abs/2504.00012 #cs.CR #cs.CY #cs.LG

I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk

Organisations are rapidly adopting artificial intelligence (AI) tools to perform tasks previously undertaken by people. The potential benefits are enormous. Separately, some organisations deploy personnel security measures to mitigate the security risks arising from trusted human insiders. Unfortunately, there is no meaningful interplay between the rapidly evolving domain of AI and the traditional world of personnel security. This is a problem. The complex risks from human insiders are hard enough to understand and manage, despite many decades of effort. The emerging security risks from AI insiders are even more opaque. Both sides need all the help they can get. Some of the concepts and approaches that have proved useful in dealing with human insiders are also applicable to the emerging risks from AI insiders. Furthermore, AI can be used defensively to protect against both human and AI insiders.

arXiv Computer Science @arxiv_cs@qoto.org

Towards Industrial-scale Product Configuration https://arxiv.org/abs/2504.00013 #cs.SE #cs.PL

Towards Industrial-scale Product Configuration

We address the challenge of product configuration in the context of increasing customer demand for diverse and complex products. We propose a solution through a curated selection of product model benchmarks formulated in the COOM language, divided into three fragments of increasing complexity. Each fragment is accompanied by a corresponding bike model example, and additional scalable product models are included in the COOM suite, along with relevant resources. We outline an ASP-based workflow for solving COOM-based configuration problems, highlighting its adaptability to different paradigms and alternative ASP solutions. The COOM Suite aims to provide a comprehensive, accessible, and representative set of examples that can serve as a common ground for stakeholders in the field of product configuration.

arXiv Computer Science @arxiv_cs@qoto.org

Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 https://arxiv.org/abs/2504.00016 #cs.CL

Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1

Integrating large language models (LLMs) like DeepSeek R1 into healthcare requires rigorous evaluation of their reasoning alignment with clinical expertise. This study assesses DeepSeek R1's medical reasoning against expert patterns using 100 MedQA clinical cases. The model achieved 93% diagnostic accuracy, demonstrating systematic clinical judgment through differential diagnosis, guideline-based treatment selection, and integration of patient-specific factors. However, error analysis of seven incorrect cases revealed persistent limitations: anchoring bias, challenges reconciling conflicting data, insufficient exploration of alternatives, overthinking, knowledge gaps, and premature prioritization of definitive treatment over intermediate care. Crucially, reasoning length correlated with accuracy - shorter responses (<5,000 characters) were more reliable, suggesting extended explanations may signal uncertainty or rationalization of errors. While DeepSeek R1 exhibits foundational clinical reasoning capabilities, recurring flaws highlight critical areas for refinement, including bias mitigation, knowledge updates, and structured reasoning frameworks. These findings underscore LLMs' potential to augment medical decision-making through artificial reasoning but emphasize the need for domain-specific validation, interpretability safeguards, and confidence metrics (e.g., response length thresholds) to ensure reliability in real-world applications.

arXiv Computer Science @arxiv_cs@qoto.org

Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion https://arxiv.org/abs/2504.00017 #cs.CV #cs.AI #cs.LG #cs.RO

Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion

Vision-based tactile sensors use structured light to measure deformation in their elastomeric interface. Until now, vision-based tactile sensors such as DIGIT and GelSight have been using a single, static pattern of structured light tuned to the specific form factor of the sensor. In this work, we investigate the effectiveness of dynamic illumination patterns, in conjunction with image fusion techniques, to improve the quality of sensing of vision-based tactile sensors. Specifically, we propose to capture multiple measurements, each with a different illumination pattern, and then fuse them together to obtain a single, higher-quality measurement. Experimental results demonstrate that this type of dynamic illumination yields significant improvements in image contrast, sharpness, and background difference. This discovery opens the possibility of retroactively improving the sensing quality of existing vision-based tactile sensors with a simple software update, and for new hardware designs capable of fully exploiting dynamic illumination.

arXiv Computer Science @arxiv_cs@qoto.org

SandboxEval: Towards Securing Test Environment for Untrusted Code https://arxiv.org/abs/2504.00018 #cs.CR #cs.LG

SandboxEval: Towards Securing Test Environment for Untrusted Code

While large language models (LLMs) are powerful assistants in programming tasks, they may also produce malicious code. Testing LLM-generated code therefore poses significant risks to assessment infrastructure tasked with executing untrusted code. To address these risks, this work focuses on evaluating the security and confidentiality properties of test environments, reducing the risk that LLM-generated code may compromise the assessment infrastructure. We introduce SandboxEval, a test suite featuring manually crafted test cases that simulate real-world safety scenarios for LLM assessment environments in the context of untrusted code execution. The suite evaluates vulnerabilities to sensitive information exposure, filesystem manipulation, external communication, and other potentially dangerous operations in the course of assessment activity. We demonstrate the utility of SandboxEval by deploying it on an open-source implementation of Dyff, an established AI assessment framework used to evaluate the safety of LLMs at scale. We show, first, that the test suite accurately describes limitations placed on an LLM operating under instructions to generate malicious code. Second, we show that the test results provide valuable insights for developers seeking to harden assessment infrastructure and identify risks associated with LLM execution activities.

arXiv Computer Science @arxiv_cs@qoto.org

ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding https://arxiv.org/abs/2504.00019 #cs.CL #cs.AI #cs.SE

ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding

Language models (LMs) have become a staple of the code-writing toolbox. Their pre-training recipe has, however, remained stagnant over recent years, barring the occasional changes in data sourcing and filtering strategies. In particular, research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse, especially compared with corresponding efforts in natural language LMs. In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency. To this end, we compile ObscuraX, a dataset of approximately 55M source and obfuscated code pairs in seven languages. Subsequently, we pre-train ObscuraCoder models, ranging in size from 255M to 2.8B parameters, on a 272B-token corpus that includes ObscuraX and demonstrate that our obfuscation-based pre-training recipe leads to consistent improvements in Code-LMs' abilities compared to both vanilla autoregressive pre-training as well as existing de-obfuscation (DOBF) objectives. ObscuraCoder demonstrates sizeable gains across multiple tests of syntactic and semantic code understanding, along with improved capabilities in multilingual code completion, multilingual code commit summarization, and multi-purpose library-oriented code generation.

arXiv Computer Science @arxiv_cs@qoto.org

detectGNN: Harnessing Graph Neural Networks for Enhanced Fraud Detection in Credit Card Transactions https://arxiv.org/abs/2503.22681 #cs.CR

detectGNN: Harnessing Graph Neural Networks for Enhanced Fraud Detection in Credit Card Transactions

Credit card fraud is a major issue nowadays, costing huge money and affecting trust in financial systems. Traditional fraud detection methods often fail to detect advanced and growing fraud techniques. This study focuses on using Graph Neural Networks (GNNs) to improve fraud detection by analyzing transactions as a network of connected data points, such as accounts, traders, and devices. The proposed "detectGNN" model uses advanced features like time-based patterns and dynamic updates to expose hidden fraud and improve detection accuracy. Tests show that GNNs perform better than traditional methods in finding complex and multi-layered fraud. The model also addresses real-time processing, data imbalance, and privacy concerns, making it practical for real-world use. This research shows that GNNs can provide a powerful, accurate, and a scalable solution for detecting fraud. Future work will focus on making the models easier to understand, privacy-friendly, and adaptable to new types of fraud, ensuring safer financial transactions in the digital world.

arXiv Computer Science @arxiv_cs@qoto.org

A Novel Chaos-Based Cryptographic Scrambling Technique to Secure Medical Images https://arxiv.org/abs/2503.22683 #cs.CR

A Novel Chaos-Based Cryptographic Scrambling Technique to Secure Medical Images

These days, a tremendous quantity of digital visual data is sent over many networks and stored in many different formats. This visual information is usually very confidential and financially rewarding. Maintaining safe transmission of data is crucial, as is the use of approaches to offer security features like privacy, integrity, or authentication that are tailored to certain types of data. Protecting sensitive medical images stored in electronic health records is the focus of this article, which proposes a technique of encryption and decryption. In order to safe-guard image-based programs, encryption methods are applied. Privacy, integrity, and authenticity are only few of the security elements investigated by the proposed system, which encrypts medical pictures using chaos maps. In all stages of the protocol, the suggested chaos-based data scrambling method is employed to mitigate the short-comings of traditional confusion and diffusion designs. Bifurcation charts, Lyapunov exponents, tests for mean squared error and peak-to-average signal-to-noise ratio, and histogram analysis are only some of the tools we use to investigate the suggested system's chaotic behavior.

arXiv Computer Science @arxiv_cs@qoto.org

Binary and Multi-Class Intrusion Detection in IoT Using Standalone and Hybrid Machine and Deep Learning Models https://arxiv.org/abs/2503.22684 #cs.CR #cs.AI

Binary and Multi-Class Intrusion Detection in IoT Using Standalone and Hybrid Machine and Deep Learning Models

Maintaining security in IoT systems depends on intrusion detection since these networks' sensitivity to cyber-attacks is growing. Based on the IoT23 dataset, this study explores the use of several Machine Learning (ML) and Deep Learning (DL) along with the hybrid models for binary and multi-class intrusion detection. The standalone machine and deep learning models like Random Forest (RF), Extreme Gradient Boosting (XGBoost), Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN) were used. Furthermore, two hybrid models were created by combining machine learning techniques: RF, XGBoost, AdaBoost, KNN, and SVM and these hybrid models were voting based hybrid classifier. Where one is for binary, and the other one is for multi-class classification. These models vi were tested using precision, recall, accuracy, and F1-score criteria and compared the performance of each model. This work thoroughly explains how hybrid, standalone ML and DL techniques could improve IDS (Intrusion Detection System) in terms of accuracy and scalability in IoT (Internet of Things).

arXiv Computer Science @arxiv_cs@qoto.org

Provenance of Adaptation in Scientific and Business Workflows -- Literature Review https://arxiv.org/abs/2503.22685 #cs.SE #cs.CR

Provenance of Adaptation in Scientific and Business Workflows -- Literature Review

In the world of science new technology have opened up the possibility to rely on advanced computational methods and models to conduct and produce scientific research. An important aspect of scientific and business workflows is provenance - which refers to the information describing the production, history or lineage of an end product, which can also be data, digitalized processes and other not tangible artifacts. While there are already systems, tools and standards to capture provenance of data and workflows the provenance of adaptations/changes in workflows has not been addressed yet. In this paper we carry out a literature review to establish the state of the art on this topic and present our methodology and findings. Our findings confirm that provenance of adaptation has not been addressed adequately in the fields of business and scientific workflows. The two fields also have different motivation for recording the lineage of data or processes. While scientific workflows are interested in reproducibility and visualization, business workflows solutions are indirectly connected to compliance, exception handling and analysis. The adaptive nature of workflows in both fields is not reflected in the research on process provenance yet, as our results show. The use of standard provenance standards is also not wide spread.

arXiv Computer Science @arxiv_cs@qoto.org

Truth in Text: A Meta-Analysis of ML-Based Cyber Information Influence Detection Approaches https://arxiv.org/abs/2503.22686 #cs.CR #cs.LG

Truth in Text: A Meta-Analysis of ML-Based Cyber Information Influence Detection Approaches

Cyber information influence, or disinformation in general terms, is widely regarded as one of the biggest threats to social progress and government stability. From US presidential elections to European Union referendums and down to regional news reporting of wildfires, lies and post-truths have normalized radical decision-making. Accordingly, there has been an explosion in research seeking to detect disinformation in online media. The frontier of disinformation detection research is leveraging a variety of ML techniques such as traditional ML algorithms like Support Vector Machines, Random Forest, and Naïve Bayes. Other research has applied deep learning models including Convolutional Neural Networks, Long Short-Term Memory networks, and transformer-based architectures. Despite the overall success of such techniques, the literature demonstrates inconsistencies when viewed holistically which limits our understanding of the true effectiveness. Accordingly, this work employed a two-stage meta-analysis to (a) demonstrate an overall meta statistic for ML model effectiveness in detecting disinformation and (b) investigate the same by subgroups of ML model types. The study found the majority of the 81 ML detection techniques sampled have greater than an 80\% accuracy with a Mean sample effectiveness of 79.18\% accuracy. Meanwhile, subgroups demonstrated no statistically significant difference between-approaches but revealed high within-group variance. Based on the results, this work recommends future work in replication and development of detection methods operating at the ML model level.

arXiv Computer Science @arxiv_cs@qoto.org

CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation https://arxiv.org/abs/2503.22688 #cs.SE #cs.PL

CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation

Large Language Models (LLMs) have demonstrated exceptional performance in code generation tasks and have become indispensable programming assistants for developers. However, existing code generation benchmarks primarily assess the functional correctness of code generated by LLMs in single-turn interactions, offering limited insight into their capabilities to generate code that strictly follows users' instructions, especially in multi-turn interaction scenarios. In this paper, we introduce \bench, a benchmark for evaluating LLMs' instruction-following capabilities in interactive code generation. Specifically, \bench incorporates nine types of verifiable instructions aligned with the real-world software development requirements, which can be independently and objectively validated through specified test cases, facilitating the evaluation of instruction-following capability in multi-turn interactions. We evaluate nine prominent LLMs using \bench, and the experimental results reveal a significant disparity between their basic programming capability and instruction-following capability, particularly as task complexity, context length, and the number of dialogue rounds increase.

arXiv Computer Science @arxiv_cs@qoto.org

From Occurrence to Consequence: A Comprehensive Data-driven Analysis of Building Fire Risk https://arxiv.org/abs/2503.22689 #physics.data-an #stat.AP #cs.LG

From Occurrence to Consequence: A Comprehensive Data-driven Analysis of Building Fire Risk

Building fires pose a persistent threat to life, property, and infrastructure, emphasizing the need for advanced risk mitigation strategies. This study presents a data-driven framework analyzing U.S. fire risks by integrating over one million fire incident reports with diverse fire-relevant datasets, including social determinants, building inventories, weather conditions, and incident-specific factors. By adapting machine learning models, we identify key risk factors influencing fire occurrence and consequences. Our findings show that vulnerable communities, characterized by socioeconomic disparities or the prevalence of outdated or vacant buildings, face higher fire risks. Incident-specific factors, such as fire origins and safety features, strongly influence fire consequences. Buildings equipped with fire detectors and automatic extinguishing systems experience significantly lower fire spread and injury risks. By pinpointing high-risk areas and populations, this research supports targeted interventions, including mandating fire safety systems and providing subsidies for disadvantaged communities. These measures can enhance fire prevention, protect vulnerable groups, and promote safer, more equitable communities.

arXiv Computer Science @arxiv_cs@qoto.org

Fragile Mastery: Are Domain-Specific Trade-Offs Undermining On-Device Language Models? https://arxiv.org/abs/2503.22698 #cs.CL

Fragile Mastery: Are Domain-Specific Trade-Offs Undermining On-Device Language Models?

The application of on-device language models (ODLMs) on resource-constrained edge devices is a multi-dimensional problem that strikes a fine balance between computational effectiveness, memory, power usage, and linguistic capacity across heterogeneous tasks. This holistic study conducts a thorough investigation of the trade-offs between domain-specific optimization and cross-domain robustness, culminating in the proposal of the Generalized Edge Model (GEM), a new architecture that aims to balance specialization and generalization in a harmonious manner. With a rigorous experimental approach testing 47 well-chosen benchmarks in eight domains--healthcare, law, finance, STEM, commonsense, conversational AI, multilingual, and domain-adaptive tasks--we show that conventional optimization techniques decrease target task perplexity by 18-25% but result in a precipitous decline in general-task performance with F1 scores decreasing by 12-29%, as reported by Liu et al. GEM employs a Sparse Cross-Attention Router (SCAR) to dynamically allocate computation to a variable number of computing resources with a cross-domain F1 accuracy of 0.89 on less than 100ms latency across Raspberry Pi 4, Pixel 6, iPhone 13, and bespoke custom neural processing units (NPUs). Compared to GPT-4 Lite, GEM enhances the general-task level by 7% with respect and parity in domain-specific performance. We propose three new measurement tools--Domain Specialization Index (DSI), Generalization Gap (GG), and Cross-Domain Transfer Ratio (CDTR)--which show strong correlation between model compression intensity and brittleness.

arXiv Computer Science @arxiv_cs@qoto.org

Validating Emergency Department Admission Predictions Based on Local Data Through MIMIC-IV https://arxiv.org/abs/2503.22706 #cs.LG #cs.AI

Validating Emergency Department Admission Predictions Based on Local Data Through MIMIC-IV

The effective management of Emergency Department (ED) overcrowding is essential for improving patient outcomes and optimizing healthcare resource allocation. This study validates hospital admission prediction models initially developed using a small local dataset from a Greek hospital by leveraging the comprehensive MIMIC-IV dataset. After preprocessing the MIMIC-IV data, five algorithms were evaluated: Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Random Forest (RF), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM Radial). Among these, RF demonstrated superior performance, achieving an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.9999, sensitivity of 0.9997, and specificity of 0.9999 when applied to the MIMIC-IV data. These findings highlight the robustness of RF in handling complex datasets for admission prediction, establish MIMIC-IV as a valuable benchmark for validating models based on smaller local datasets, and provide actionable insights for improving ED management strategies.

arXiv Computer Science @arxiv_cs@qoto.org

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation https://arxiv.org/abs/2503.22708 #cs.AI #cs.CL

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation

Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018