All-atom inverse protein folding through discrete flow matching arxiv.org/abs/2507.14156

All-atom inverse protein folding through discrete flow matching

The recent breakthrough of AlphaFold3 in modeling complex biomolecular interactions, including those between proteins and ligands, nucleotides, or metal ions, creates new opportunities for protein design. In so-called inverse protein folding, the objective is to find a sequence of amino acids that adopts a target protein structure. Many inverse folding methods struggle to predict sequences for complexes that contain non-protein components, and perform poorly with complexes that adopt multiple structural states. To address these challenges, we present ADFLIP (All-atom Discrete FLow matching Inverse Protein folding), a generative model based on discrete flow-matching for designing protein sequences conditioned on all-atom structural contexts. ADFLIP progressively incorporates predicted amino acid side chains as structural context during sequence generation and enables the design of dynamic protein complexes through ensemble sampling across multiple structural states. Furthermore, ADFLIP implements training-free classifier guidance sampling, which allows the incorporation of arbitrary pre-trained models to optimise the designed sequence for desired protein properties. We evaluated the performance of ADFLIP on protein complexes with small-molecule ligands, nucleotides, or metal ions, including dynamic complexes for which structure ensembles were determined by nuclear magnetic resonance (NMR). Our model achieves state-of-the-art performance in single-structure and multi-structure inverse folding tasks, demonstrating excellent potential for all-atom protein design. The code is available at https://github.com/ykiiiiii/ADFLIP.

arXiv.org

Predicting Perceptual Boundaries in Auditory Streaming using Delay Differential Equations arxiv.org/abs/2507.14157

Predicting Perceptual Boundaries in Auditory Streaming using Delay Differential Equations

Auditory streaming enables the brain to organize sequences of sounds into perceptually distinct sources, such as following a conversation in a noisy environment. A typical experiment for investigating perceptual boundaries and bistability is to present a subject with a stream containing two alternating tone stimuli. We investigate a model for the processing of such a stream consisting of two identical neural populations of excitatory and inhibitory neurons. The populations are coupled via delayed cross-inhibition and periodically forced with sharp step-type signals (the two-tone stream). We track how the perception boundary depends on threshold selection and establish how boundaries between three different auditory perceptions (single tone versus two tones versus bistability between both perceptions) relate to bifurcations such as symmetry breaking. We demonstrate that these transitions are governed by symmetry-breaking bifurcations and that the perceptual classification based on neural thresholds is highly sensitive to threshold choice. Our analysis reveals that a fixed threshold is insufficient to capture the true perceptual boundaries and proposes a variable-threshold criterion, informed by the amplitude dynamics of neural responses. Finally, we illustrate how key stimulus parameters such as tone duration, delay, and internal time scale shape the boundaries of auditory perceptual organization in the plane of the two most commonly varied experimental parameters, the representation rate, and the difference in tone frequency. These findings offer mechanistic insight into auditory perception dynamics and provide a refined framework for linking neural activity to perceptual organization.

arXiv.org

Modeling Language Evolution Using a Spin Glass Approach arxiv.org/abs/2507.14375

Modeling Language Evolution Using a Spin Glass Approach

The evolution of natural languages poses a riddle to any theoretical perspective based on efficiency considerations. If languages are already optimally effective means of organization and communication of thought, why do they change? And if they are driven to become optimally effective in the future, why do they change so slowly, and why do they diversify, rather than converge towards an optimum? We look here at the hypothesis that disorder, rather than efficiency, may play a dominant role. Most traditional approaches to study diachronic language dynamics emphasize lexical data, but a crucial contribution to the effectiveness of a thought-coding device is given by its core structure, its syntax. Based on the reduction of syntax to a set of binary parameters, we introduce here a model of natural language change in which diachronic dynamics are mediated by disordered interactions between parameters, even in the idealized limit of identical external inputs. We show in which region of `phase space' such dynamics show the glassy features that are observed in natural languages. In particular, syntactic vectors remain trapped in glassy metastable (tendentially stable) states when the degree of asymmetry in the disordered interactions is below a critical value, consistent with studies of spin glasses with asymmetric interactions. We further show that an added Hopfield-type memory term, would indeed, if strong enough, stabilize syntactic configurations even above the critical value, but losing the multiplicity of stable states. Finally, using a notion of linguistic distance in syntactic space we show that a phylogenetic signal may remain among related languages, despite their gradually divergent syntax, exactly as recently pointed out for real-world languages. These statistical results appear to generalize beyond the dataset of 94 syntactic parameters across 58 languages, used in this study.

arXiv.org

Computations Meet Experiments to Advance the Enzymatic Depolymerization of Plastics One Atom at a Time arxiv.org/abs/2507.14413

Computations Meet Experiments to Advance the Enzymatic Depolymerization of Plastics One Atom at a Time

Plastics are essential to modern life, yet poor disposal practices contribute to low recycling rates and environmental accumulation-biological degradation and by-product reuse offer a path to mitigate this global threat. This report highlights key insights, future challenges, and research priorities identified during the CECAM workshop "Computations Meet Experiments to Advance the Enzymatic Depolymerization of Plastics One Atom at a Time", held in Trieste from May 6-8, 2025. The workshop brought together an interdisciplinary community of scientists focused on advancing the sustainable use of plastics through enzyme-based degradation. A key point from the discussions is that many bottlenecks in enzymatic recycling arise not only from process engineering challenges, but also from a limited understanding of the underlying molecular mechanisms. We argue that constraints on economic viability and sustainability (e.g., harsh solvents, high temperatures, substrate crystallinity, pretreatments) can-and should-be addressed directly through enzyme design, provided these factors are understood at the molecular level, in synergy with process optimization. For this, it is essential to rely on the integration of experimental and computational approaches to uncover the molecular and mechanistic basis of enzymatic plastic degradation. We highlight how the small-format structure of the workshop, in line with the usual CECAM format, fostered a collaborative, friendly, and relaxed atmosphere. We hope this report encourages future initiatives and the formation of shared consortia to support an open, collaborative, and bio-based plastic recycling community.

arXiv.org

Knowing when to stop: insights from ecology for building catalogues, collections, and corpora arxiv.org/abs/2507.14614

Knowing when to stop: insights from ecology for building catalogues, collections, and corpora

A major locus of musicological activity-increasingly in the digital domain-is the cataloguing of sources, which requires large-scale and long-lasting research collaborations. Yet, the databases aiming at covering and representing musical repertoires are never quite complete, and scholars must contend with the question: how much are we still missing? This question structurally resembles the 'unseen species' problem in ecology, where the true number of species must be estimated from limited observations. In this case study, we apply for the first time the common Chao1 estimator to music, specifically to Gregorian chant. We find that, overall, upper bounds for repertoire coverage of the major chant genres range between 50 and 80 %. As expected, we find that Mass Propers are covered better than the Divine Office, though not overwhelmingly so. However, the accumulation curve suggests that those bounds are not tight: a stable ~5% of chants in sources indexed between 1993 and 2020 was new, so diminishing returns in terms of repertoire diversity are not yet to be expected. Our study demonstrates that these questions can be addressed empirically to inform musicological data-gathering, showing the potential of unseen species models in musicology.

arXiv.org

KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction arxiv.org/abs/2507.14639

KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction

Kinetic parameters such as the turnover number ($k_{cat}$) and Michaelis constant ($K_{\mathrm{M}}$) are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality reduction using principal--component analysis (PCA) on concatenated protein features, and rebalance the training data via a similarity-based oversampling strategy. KinForm outperforms baseline methods on two benchmark datasets. Improvements are most pronounced in low sequence similarity bins. We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins. We also find that removing sequence overlap between folds provides a more realistic evaluation of generalisation and should be the standard over random splitting when benchmarking kinetic prediction models.

arXiv.org

Partitioning of Eddy Covariance Footprint Evapotranspiration Using Field Data, UAS Observations and GeoAI in the U.S. Chihuahuan Desert arxiv.org/abs/2507.14829

Partitioning of Eddy Covariance Footprint Evapotranspiration Using Field Data, UAS Observations and GeoAI in the U.S. Chihuahuan Desert

This study proposes a new method for computing transpiration across an eddy covariance footprint using field observations of plant sap flow, phytomorphology sampling, uncrewed aerial system (UAS), deep learning-based digital image processing, and eddy covariance micrometeorological measurements. The method is applied to the Jornada Experimental Range, New Mexico, where we address three key questions: (1) What are the daily summer transpiration rates of Mesquite (Prosopis glandulosa) and Creosote (Larrea tridentata) individuals, and how do these species contribute to footprint-scale evapotranspiration? (2) How can the plant-level measurements be integrated for terrain-wide transpiration estimates? (3) What is the contribution of transpiration to total evapotranspiration within the eddy covariance footprint? Data collected from June to October 2022, during the North American Monsoon season, include hourly evapotranspiration and precipitation rates from the Ameriflux eddy covariance system (US Jo-1 Bajada site) and sap flux rates from heat-balance sensors. We used plant biometric measurements and supervised classification of multispectral imagery to upscale from the patch to footprint-scale estimations. A proportional relationship between the plant's horizontal projected area and the estimated number of water flow conduits was extended to the eddy covariance footprint via UAS data. Our results show that Mesquite's average daily summer transpiration is 2.84 mm/d, while Creosote's is 1.78 mm/d (a ratio of 1.6:1). The summer footprint integrated transpiration to evapotranspiration ratio (T/ET) was 0.50, decreasing to 0.44 during dry spells and increasing to 0.63 following significant precipitation. Further testing of this method is needed in different regions to validate its applicability. With appropriate adjustments, it could be relevant for other areas with similar ecological conditions.

arXiv.org

The Generalist Brain Module: Module Repetition in Neural Networks in Light of the Minicolumn Hypothesis arxiv.org/abs/2507.12473

The Generalist Brain Module: Module Repetition in Neural Networks in Light of the Minicolumn Hypothesis

While modern AI continues to advance, the biological brain remains the pinnacle of neural networks in its robustness, adaptability, and efficiency. This review explores an AI architectural path inspired by the brain's structure, particularly the minicolumn hypothesis, which views the neocortex as a distributed system of repeated modules - a structure we connect to collective intelligence (CI). Despite existing work, there is a lack of comprehensive reviews connecting the cortical column to the architectures of repeated neural modules. This review aims to fill that gap by synthesizing historical, theoretical, and methodological perspectives on neural module repetition. We distinguish between architectural repetition - reusing structure - and parameter-shared module repetition, where the same functional unit is repeated across a network. The latter exhibits key CI properties such as robustness, adaptability, and generalization. Evidence suggests that the repeated module tends to converge toward a generalist module: simple, flexible problem solvers capable of handling many roles in the ensemble. This generalist tendency may offer solutions to longstanding challenges in modern AI: improved energy efficiency during training through simplicity and scalability, and robust embodied control via generalization. While empirical results suggest such systems can generalize to out-of-distribution problems, theoretical results are still lacking. Overall, architectures featuring module repetition remain an emerging and unexplored architectural strategy, with significant untapped potential for both efficiency, robustness, and adaptiveness. We believe that a system that adopts the benefits of CI, while adhering to architectural and functional principles of the minicolumns, could challenge the modern AI problems of scalability, energy consumption, and democratization.

arXiv.org

GLOMIA-Pro: A Generalizable Longitudinal Medical Image Analysis Framework for Disease Progression Prediction arxiv.org/abs/2507.12500

GLOMIA-Pro: A Generalizable Longitudinal Medical Image Analysis Framework for Disease Progression Prediction

Longitudinal medical images are essential for monitoring disease progression by capturing spatiotemporal changes associated with dynamic biological processes. While current methods have made progress in modeling spatiotemporal patterns, they face three key limitations: (1) lack of generalizable framework applicable to diverse disease progression prediction tasks; (2) frequent overlook of the ordinal nature inherent in disease staging; (3) susceptibility to representation collapse due to structural similarities between adjacent time points, which can obscure subtle but discriminative progression biomarkers. To address these limitations, we propose a Generalizable LOngitudinal Medical Image Analysis framework for disease Progression prediction (GLOMIA-Pro). GLOMIA-Pro consists of two core components: progression representation extraction and progression-aware fusion. The progression representation extraction module introduces a piecewise orthogonal attention mechanism and employs a novel ordinal progression constraint to disentangle finegrained temporal imaging variations relevant to disease progression. The progression-aware fusion module incorporates a redesigned skip connection architecture which integrates the learned progression representation with current imaging representation, effectively mitigating representation collapse during cross-temporal fusion. Validated on two distinct clinical applications: knee osteoarthritis severity prediction and esophageal cancer treatment response assessment, GLOMIA-Pro consistently outperforms seven state-of-the-art longitudinal analysis methods. Ablation studies further confirm the contribution of individual components, demonstrating the robustness and generalizability of GLOMIA-Pro across diverse clinical scenarios.

arXiv.org

Cognitive Modelling Aspects of Neurodevelopmental Disorders Using Standard and Oscillating Neighbourhood SOM Neural Networks arxiv.org/abs/2507.12567

Cognitive Modelling Aspects of Neurodevelopmental Disorders Using Standard and Oscillating Neighbourhood SOM Neural Networks

Background/Introduction: In this paper, the neural network class of Self-Organising Maps (SOMs) is investigated in terms of its theoretical and applied validity for cognitive modelling, particularly of neurodevelopmental disorders. Methods: A modified SOM network type, with increased biological plausibility, incorporating a type of cortical columnar oscillation in the form of an oscillating Topological Neighbourhood (TN), is introduced and applied alongside the standard SOM. Aspects of two neurodevelopmental disorders, autism and schizophrenia, are modelled using SOM networks, based on existing neurocomputational theories. Both standard and oscillating-TN SOM training is employed with targeted modifications in the TN width function. Computer simulations are conducted using revised versions of a previously introduced model (IPSOM) based on a new modelling hypothesis. Results/Conclusions: The results demonstrate that there is strong similarity between standard and oscillating-TN SOM modelling in terms of map formation behaviour, output and structure, while the oscillating version offers a more realistic computational analogue of brain function. Neuroscientific and computational arguments are presented to validate the proposed SOM modification within a cognitive modelling framework.

arXiv.org

Mapping Emotions in the Brain: A Bi-Hemispheric Neural Model with Explainable Deep Learning arxiv.org/abs/2507.12625

Mapping Emotions in the Brain: A Bi-Hemispheric Neural Model with Explainable Deep Learning

Recent advances have shown promise in emotion recognition from electroencephalogram (EEG) signals by employing bi-hemispheric neural architectures that incorporate neuroscientific priors into deep learning models. However, interpretability remains a significant limitation for their application in sensitive fields such as affective computing and cognitive modeling. In this work, we introduce a post-hoc interpretability framework tailored to dual-stream EEG classifiers, extending the Local Interpretable Model-Agnostic Explanations (LIME) approach to accommodate structured, bi-hemispheric inputs. Our method adapts LIME to handle structured two-branch inputs corresponding to left and right-hemisphere EEG channel groups. It decomposes prediction relevance into per-channel contributions across hemispheres and emotional classes. We apply this framework to a previously validated dual-branch recurrent neural network trained on EmoNeuroDB, a dataset of EEG recordings captured during a VR-based emotion elicitation task. The resulting explanations reveal emotion-specific hemispheric activation patterns consistent with known neurophysiological phenomena, such as frontal lateralization in joy and posterior asymmetry in sadness. Furthermore, we aggregate local explanations across samples to derive global channel importance profiles, enabling a neurophysiologically grounded interpretation of the model's decisions. Correlation analysis between symmetric electrodes further highlights the model's emotion-dependent lateralization behavior, supporting the functional asymmetries reported in affective neuroscience.

arXiv.org

Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks arxiv.org/abs/2507.12858

Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Functional differentiation in the brain emerges as distinct regions specialize and is key to understanding brain function as a complex system. Previous research has modeled this process using artificial neural networks with specific constraints. Here, we propose a novel approach that induces functional differentiation in recurrent neural networks by minimizing mutual information between neural subgroups via mutual information neural estimation. We apply our method to a 2-bit working memory task and a chaotic signal separation task involving Lorenz and Rössler time series. Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Importantly, our results show that functional differentiation, which is measured through correlation structures, emerges earlier than structural modularity defined by synaptic weights. This suggests that functional specialization precedes and probably drives structural reorganization within developing neural networks. Our findings provide new insights into how information-theoretic principles may govern the emergence of specialized functions and modular structures during artificial and biological brain development.

arXiv.org

Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19 arxiv.org/abs/2507.12966

Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19

Emerging in December 2019, the COVID-19 pandemic caused widespread health, economic, and social disruptions. Rapid global transmission overwhelmed healthcare systems, resulting in high infection rates, hospitalisations, and fatalities. To minimise the spread, governments implemented several non-pharmaceutical interventions like lockdowns and travel restrictions. While effective in controlling transmission, these measures also posed significant economic and societal challenges. Although the WHO declared COVID-19 no longer a global health emergency in May 2023, its impact persists, shaping public health strategies. The vast amount of data collected during the pandemic offers valuable insights into disease dynamics, transmission, and intervention effectiveness. Leveraging these insights can improve forecasting models, enhancing preparedness and response to future outbreaks while mitigating their social and economic impact. This paper presents a large-scale case study on COVID-19 forecasting in Cyprus, utilising a two-year dataset that integrates epidemiological data, vaccination records, policy measures, and weather conditions. We analyse infection trends, assess forecasting performance, and examine the influence of external factors on disease dynamics. The insights gained contribute to improved pandemic preparedness and response strategies.

arXiv.org

Life Finds A Way: Emergence of Cooperative Structures in Adaptive Threshold Networks arxiv.org/abs/2507.13253

Life Finds A Way: Emergence of Cooperative Structures in Adaptive Threshold Networks

There has been a long debate on how new levels of organization have evolved. It might seem unlikely, as cooperation must prevail over competition. One well-studied example is the emergence of autocatalytic sets, which seem to be a prerequisite for the evolution of life. Using a simple model, we investigate how varying bias toward cooperation versus antagonism shapes network dynamics, revealing that higher-order organization emerges even amid pervasive antagonistic interactions. In general, we observe that a quantitative increase in the number of elements in a system leads to a qualitative transition. We present a random threshold-directed network model that integrates node-specific traits with dynamic edge formation and node removal, simulating arbitrary levels of cooperation and competition. In our framework, intrinsic node values determine directed links through various threshold rules. Our model generates a multi-digraph with signed edges (reflecting support/antagonism, labeled ``help''/``harm''), which ultimately yields two parallel yet interdependent threshold graphs. Incorporating temporal growth and node turnover in our approach allows exploration of the evolution, adaptation, and potential collapse of communities and reveals phase transitions in both connectivity and resilience. Our findings extend classical random threshold and Erdős-Rényi models, offering new insights into adaptive systems in biological and economic contexts, with emphasis on the application to Collective Affordance Sets. This framework should also be useful for making predictions that will be tested by ongoing experiments of microbial communities in soil.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.