Show newer

Emergent dynamical phases and collective motion in termites arxiv.org/abs/2505.08790

Emergent dynamical phases and collective motion in termites

Termites which are able to forage in the open can be often seen, in the field or in the lab: (i) wandering around, forming no observable pattern, or (ii) clustering themselves in a dense and almost immobile pack, or (iii) milling about in a circular movement. Despite been well reported patterns, they are normally regarded as independent phenomena whose specific traits have never been properly quantified. Evidence, however, favours the hypothesis that these are interdependent patterns, arisen from self-organised interactions and movement among workers. After all, termites are a form of active matter where blind cooperative individuals are self-propelled and lack the possibility of visual cues to spatially orientate and align. It follows that their non-trivial close-contact patterns could generate motion-collision induced phase separations. This would then trigger the emergence of these three patterns (disorder, clustering, milling) as parts of the same continuum. By inspecting termite groups confined in arenas, we could quantitatively describe each one of these patterns in detail. We identified disorder, clustering and milling spatial patterns. These phases and their transitions are characterised aiming to offer refinements in the understanding of these aspects of self-propelled particles in active matter where close-range contacts and collisions are important.

arXiv.org

A Comparative Study of Transformer-Based Models for Multi-Horizon Blood Glucose Prediction arxiv.org/abs/2505.08821

A Comparative Study of Transformer-Based Models for Multi-Horizon Blood Glucose Prediction

Accurate blood glucose prediction can enable novel interventions for type 1 diabetes treatment, including personalized insulin and dietary adjustments. Although recent advances in transformer-based architectures have demonstrated the power of attention mechanisms in complex multivariate time series prediction, their potential for blood glucose (BG) prediction remains underexplored. We present a comparative analysis of transformer models for multi-horizon BG prediction, examining forecasts up to 4 hours and input history up to 1 week. The publicly available DCLP3 dataset (n=112) was split (80%-10%-10%) for training, validation, and testing, and the OhioT1DM dataset (n=12) served as an external test set. We trained networks with point-wise, patch-wise, series-wise, and hybrid embeddings, using CGM, insulin, and meal data. For short-term blood glucose prediction, Crossformer, a patch-wise transformer architecture, achieved a superior 30-minute prediction of RMSE (15.6 mg / dL on OhioT1DM). For longer-term predictions (1h, 2h, and 4h), PatchTST, another path-wise transformer, prevailed with the lowest RMSE (24.6 mg/dL, 36.1 mg/dL, and 46.5 mg/dL on OhioT1DM). In general, models that used tokenization through patches demonstrated improved accuracy with larger input sizes, with the best results obtained with a one-week history. These findings highlight the promise of transformer-based architectures for BG prediction by capturing and leveraging seasonal patterns in multivariate time-series data to improve accuracy.

arXiv.org

High-throughput Screening of the Mechanical Properties of Peptide Assemblies arxiv.org/abs/2505.08850

High-throughput Screening of the Mechanical Properties of Peptide Assemblies

Peptides are recognized for their varied self-assembly behaviors, forming a wide array of structures and geometries, such as spheres, fibers, and hydrogels, each presenting a unique set of material properties. The functionalities of these materials hold exceptional interest for applications in biology, medicine, photonics, nanotechnology and the food industry. In specific, the ability to exploit peptides as viable and sustainable mechanical materials requires sequence design that enables superior performance, notably a high Young's modulus. As the peptide sequence space is vast, however, even a slight increase in sequence length leads to an exponential increase in the number of potential peptide sequences to be characterized. Here, we combine coarse-grained molecular dynamics simulations, atomic force microscopy experiments and machine learning models to correlate the sequence length and composition with the mechanical properties of self-assembled peptides. We calculate the Young's modulus for all possible amino acid sequences of di- and tripeptides using high-throughput coarse-grained methods, and validate these calculations through in-situ mechanical characterization. For pentapeptides, we select and calculate properties for a subset of sequences to train a machine learning model, which allows us to predict the modulus for other sequences. The combined workflow not only identifies promising peptide candidates with exceptional mechanical performances, but also extends current understanding of the sequence-to-function relationships for peptide materials, for specific applications.

arXiv.org

When repeats drive the vocabulary: a Byte-Pair Encoding analysis of T2T primate genomes arxiv.org/abs/2505.08918

When repeats drive the vocabulary: a Byte-Pair Encoding analysis of T2T primate genomes

The emergence of telomere-to-telomere (T2T) genome assemblies has opened new avenues for comparative genomics, yet effective tokenization strategies for genomic sequences remain underexplored. In this pilot study, we apply Byte Pair Encoding (BPE) to nine T2T primate genomes including three human assemblies by training independent BPE tokenizers with a fixed vocabulary of 512,000 tokens using our custom tool, dnaBPE. Our analysis reveals that only 11,569 tokens are shared across all assemblies, while nearly 991,854 tokens are unique to a single genome, indicating a rapid decline in shared vocabulary with increasing assembly comparisons. Moreover, phylogenetic trees derived from token overlap failed to recapitulate established primate relationships, a discrepancy attributed to the disproportionate influence of species-specific high-copy repetitive elements. These findings underscore the dual nature of BPE tokenization: while it effectively compresses repetitive sequences, its sensitivity to high-copy elements limits its utility as a universal tool for comparative genomics. We discuss potential hybrid strategies and repeat-masking approaches to refine genomic tokenization, emphasizing the need for domain-specific adaptations in the development of large-scale genomic language models. The dnaBPE tool used in this study is open-source and available at https://github.com/aglabx/dnaBPE.

arXiv.org

Multimodal Modeling of Ultradian Rhythms Using the Hankel Alternative View of Koopman (HAVOK) Analysis arxiv.org/abs/2505.08953

Multimodal Modeling of Ultradian Rhythms Using the Hankel Alternative View of Koopman (HAVOK) Analysis

Ultradian rhythms - quasi-rhythmic fluctuations in behavior and physiology with periods shorter than 24 hours - are observed across various organisms, including humans. Despite their role in key biological processes such as sleep architecture and hormone regulation, their underlying mechanisms remain poorly understood. Here, we leveraged wearable sensor technology for continuous monitoring of physiological signals in 16 healthy participants over two weeks. By systematically removing circadian and longer-scale rhythms, we isolated ultradian dynamics and modeled them using the Hankel Alternative View of Koopman (HAVOK) framework,a data-driven approach based on Takens' embedding theorem and Koopman operator theory. This allowed us to characterize ultradian rhythms as an intermittently forced linear system and distinguish between regular oscillatory behavior and more complex dynamics. Across participants, ultradian fluctuations were well-described by the HAVOK model, with intermittent forcing consistently observed. The model demonstrated strong forecasting accuracy, with root mean squared error (RMSE) of $0.0315 \pm 0.02$, $0.0306 \pm 0.02$, and $0.0218 \pm 0.02$ in the leading time-delay coordinates. Notably, a significant sex difference in model rank (z = -2.06, p = 0.0396) suggests that sex hormones may play a key role in ultradian dynamics. These findings provide evidence for intermittently forced linear systems as a useful framework for understanding ultradian rhythms and their regulation.

arXiv.org

An Exact Moment-Based Approach for Chemical Reaction-Diffusion Networks: From Mass Action to Hill Functions arxiv.org/abs/2505.09053

An Exact Moment-Based Approach for Chemical Reaction-Diffusion Networks: From Mass Action to Hill Functions

Biochemical systems are inherently stochastic, particularly those with small-molecule populations. The spatial distribution of molecules plays a critical role and requires the inclusion of spatial coordinates in their analysis. Stochastic models such as the chemical master equation are commonly used to study these systems. However, analytical solutions are limited to specific cases, and stochastic simulations require significant computational resources. To mitigate these challenges, approximation methods, such as the moment approach, reduce the system to a set of ordinary differential equations, thereby lowering the computational requirements. This study investigates the conditions under which the second-moment approach yields exact results during the dynamic evolution of chemical reaction-diffusion networks. The analysis encompasses second-order or higher-order reactions and Hill functions without relying on higher-order moment estimations or closure approximations. Starting with stationary states, we extended the analysis to a dynamic evolution. An enzymatic process and an antithetic feedback system were examined for purely reactive systems, demonstrating the approach's accuracy in capturing system behavior and quantifying errors. The study was further extended to genetic regulatory networks governed by Hill functions, including both purely reactive and reaction-diffusion systems, validating the method in spatially distributed contexts. This framework enables precise characterization of biochemical systems, avoiding information loss typically associated with approximations and allowing for stability analysis under fluctuations. These findings optimize computational strategies while providing insights into intracellular signaling and regulatory processes, paving the way for efficient and accurate stochastic modeling in biochemical systems.

arXiv.org

Linear to Neural Networks Regression: QSPR of Drugs via Degree-Distance Indices arxiv.org/abs/2505.07821

Linear to Neural Networks Regression: QSPR of Drugs via Degree-Distance Indices

This study conducts a Quantitative Structure Property Relationship (QSPR) analysis to explore the correlation between the physical properties of drug molecules and their topological indices using machine learning techniques. While prior studies in drug design have focused on degree-based topological indices, this work analyzes a dataset of 166 drug molecules by computing degree-distance-based topological indices, incorporating vertex-edge weightings with respect to different six atomic properties (atomic number, atomic radius, atomic mass, density, electronegativity, ionization). Both linear models (Linear Regression, Lasso, and Ridge Regression) and nonlinear approaches (Random Forest, XGBoost, and Neural Networks) were employed to predict molecular properties. The results demonstrate the effectiveness of these indices in predicting specific physicochemical properties and underscore the practical relevance of computational methods in molecular property estimation. The study provides an innovative perspective on integrating topological indices with machine learning to enhance predictive accuracy, highlighting their potential application in drug discovery and development processes. This predictive may also explain that establishing a reliable relationship between topological indices and physical properties enables chemists to gain preliminary insights into molecular behavior before conducting experimental analyses, thereby optimizing resource utilization in cheminformatics research.

arXiv.org

CellVerse: Do Large Language Models Really Understand Cell Biology? arxiv.org/abs/2505.07865

CellVerse: Do Large Language Models Really Understand Cell Biology?

Recent studies have demonstrated the feasibility of modeling single-cell data as natural languages and the potential of leveraging powerful large language models (LLMs) for understanding cell biology. However, a comprehensive evaluation of LLMs' performance on language-driven single-cell analysis tasks still remains unexplored. Motivated by this challenge, we introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data and encompasses three hierarchical levels of single-cell analysis tasks: cell type annotation (cell-level), drug response prediction (drug-level), and perturbation analysis (gene-level). Going beyond this, we systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse. Remarkably, the experimental results reveal: (1) Existing specialist models (C2S-Pythia) fail to make reasonable decisions across all sub-tasks within CellVerse, while generalist models such as Qwen, Llama, GPT, and DeepSeek family models exhibit preliminary understanding capabilities within the realm of cell biology. (2) The performance of current LLMs falls short of expectations and has substantial room for improvement. Notably, in the widely studied drug response prediction task, none of the evaluated LLMs demonstrate significant performance improvement over random guessing. CellVerse offers the first large-scale empirical demonstration that significant challenges still remain in applying LLMs to cell biology. By introducing CellVerse, we lay the foundation for advancing cell biology through natural languages and hope this paradigm could facilitate next-generation single-cell analysis.

arXiv.org

Preventing SARS-CoV-2 superspreading events with antiviral intranasal sprays arxiv.org/abs/2505.08053

Preventing SARS-CoV-2 superspreading events with antiviral intranasal sprays

Superspreading events are known to disproportionally contribute to onwards transmission of epidemic and pandemic viruses. Preventing infections at a small number of high-transmission settings is therefore an attractive public health goal. Here, we use deterministic and stochastic mathematical modelling to quantify the impact of intranasal sprays in containing outbreaks at a known superspreading event (the 2020 SARS-CoV-2 outbreak at the Diamond Princess cruise ship) and a conference event that led to extensive transmission. We find that in the Diamond Princess cruise ship case study, there exists a 7-14-day window of opportunity for widespread prophylactic spray usage to significantly impact the number of infections averted. Given an immediate response to a known SARS-CoV-2 outbreak, alongside testing and social distancing measures, prophylactic efficacy and coverage greater than 65% could reduce the average number of infections by over 90%. In the conference case study, in the absence of additional public health interventions, analyses suggest much higher prophylactic efficacies and coverages are required to achieve a similar outcome. However, prophylactic use can half an individual's probability of being infected, and significantly reduce the probability of developing a severe infection. These results suggest that at a known potential superspreading event, early use of intranasal sprays can complement quarantining measures and significantly suppress a SARS-CoV-2 outbreak, even at suboptimal coverage. At a potential superspreading event of short duration, intranasal sprays can reduce individuals' risk of infection, but cannot prevent all infections or onwards community transmission.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.