Show newer

Hemodynamic Markers: CFD-Based Prediction of Cerebral Aneurysm Rupture Risk arxiv.org/abs/2504.10524

Hemodynamic Markers: CFD-Based Prediction of Cerebral Aneurysm Rupture Risk

This study investigates the influence of aneurysm evolution on hemodynamic characteristics within the sac region. Using computational fluid dynamics (CFD), blood flow through the parent vessel and aneurysm sac was analyzed to assess the impact on wall shear stress (WSS), time-averaged wall shear stress (TAWSS), and the oscillatory shear index (OSI), key indicators of rupture risk. Additionally, Relative Residence Time (RRT) and Endothelial Cell Activation Potential (ECAP) were examined to provide a broader understanding of the aneurysm's hemodynamic environment. Six distinct cerebral aneurysm (CA) models, all from individuals of the same gender, were selected to minimize gender-related variability. Results showed that unruptured cases exhibited higher WSS and TAWSS, along with lower OSI and RRT values patterns consistent with stable flow conditions supporting vascular integrity. In contrast, ruptured cases had lower WSS and TAWSS, coupled with elevated OSI and RRT, suggesting disturbed and oscillatory flow commonly linked to aneurysm wall weakening. ECAP was also higher in ruptured cases, indicating increased endothelial activation under unstable flow. Notably, areas with the highest OSI and RRT often aligned with vortex centers, reinforcing the association between disturbed flow and aneurysm instability. These findings highlight the value of combining multiple hemodynamic parameters for rupture risk assessment. Including RRT and ECAP provides deeper insight into flow endothelium-interactions, offering a stronger basis for evaluating aneurysm stability and guiding treatment decisions.

arXiv.org

BioChemInsight: An Open-Source Toolkit for Automated Identification and Recognition of Optical Chemical Structures and Activity Data in Scientific Publications arxiv.org/abs/2504.10525

BioChemInsight: An Open-Source Toolkit for Automated Identification and Recognition of Optical Chemical Structures and Activity Data in Scientific Publications

Automated extraction of chemical structures and their bioactivity data is crucial for accelerating drug discovery and enabling data-driven pharmaceutical research. Existing optical chemical structure recognition (OCSR) tools fail to autonomously associate molecular structures with their bioactivity profiles, creating a critical bottleneck in structure-activity relationship (SAR) analysis. Here, we present BioChemInsight, an open-source pipeline that integrates: (1) DECIMER Segmentation and MolVec for chemical structure recognition, (2) Qwen2.5-VL-32B for compound identifier association, and (3) PaddleOCR with Gemini-2.0-flash for bioactivity extraction and unit normalization. We evaluated the performance of BioChemInsight on 25 patents and 17 articles. BioChemInsight achieved 95% accuracy for tabular patent data (structure/identifier recognition), with lower accuracy in non-tabular patents (~80% structures, ~75% identifiers), plus 92.2 % bioactivity extraction accuracy. For articles, it attained >99% identifiers and 78-80% structure accuracy in non-tabular formats, plus 97.4% bioactivity extraction accuracy. The system generates ready-to-use SAR datasets, reducing data preprocessing time from weeks to hours while enabling applications in high-throughput screening and ML-driven drug design (https://github.com/dahuilangda/BioChemInsight).

arXiv.org

AI-guided Antibiotic Discovery Pipeline from Target Selection to Compound Identification arxiv.org/abs/2504.11091

AI-guided Antibiotic Discovery Pipeline from Target Selection to Compound Identification

Antibiotic resistance presents a growing global health crisis, demanding new therapeutic strategies that target novel bacterial mechanisms. Recent advances in protein structure prediction and machine learning-driven molecule generation offer a promising opportunity to accelerate drug discovery. However, practical guidance on selecting and integrating these models into real-world pipelines remains limited. In this study, we develop an end-to-end, artificial intelligence-guided antibiotic discovery pipeline that spans target identification to compound realization. We leverage structure-based clustering across predicted proteomes of multiple pathogens to identify conserved, essential, and non-human-homologous targets. We then systematically evaluate six leading 3D-structure-aware generative models$\unicode{x2014}$spanning diffusion, autoregressive, graph neural network, and language model architectures$\unicode{x2014}$on their usability, chemical validity, and biological relevance. Rigorous post-processing filters and commercial analogue searches reduce over 100 000 generated compounds to a focused, synthesizable set. Our results highlight DeepBlock and TamGen as top performers across diverse criteria, while also revealing critical trade-offs between model complexity, usability, and output quality. This work provides a comparative benchmark and blueprint for deploying artificial intelligence in early-stage antibiotic development.

arXiv.org

Automatic Raman Measurements in a High-Throughput Bioprocess Development Lab arxiv.org/abs/2504.11234

Automatic Raman Measurements in a High-Throughput Bioprocess Development Lab

This study presents a collection of physical devices and software services that fully automate Raman spectra measurements for liquid samples within a robotic facility. This method is applicable to various fields, with demonstrated efficacy in biotechnology, where Raman spectroscopy monitors substrates, metabolites, and product-related concentrations. Our system specifically measures 50 $\micro L$ samples using a liquid handling robot capable of taking 8 samples simultaneously. We record multiple Raman spectra for 10s each. Furthermore, our system takes around 20s for sample handling, cleaning, and preparation of the next measurement. All spectra and metadata are stored in a database, and we use a machine learning model to estimate concentrations from the spectra. This automated approach enables gathering spectra for various applications under uniform conditions in high-throughput fermentation processes, calibration procedures, and offline evaluations. This allows data to be combined to train sophisticated machine learning models with improved generalization. Consequently, we can develop accurate models more quickly for new applications by reusing data from prior applications, thereby reducing the need for extensive calibration data.

arXiv.org

Cryo-em images are intrinsically low dimensional arxiv.org/abs/2504.11249

Cryo-em images are intrinsically low dimensional

Simulation-based inference provides a powerful framework for cryo-electron microscopy, employing neural networks in methods like CryoSBI to infer biomolecular conformations via learned latent representations. This latent space represents a rich opportunity, encoding valuable information about the physical system and the inference process. Harnessing this potential hinges on understanding the underlying geometric structure of these representations. We investigate this structure by applying manifold learning techniques to CryoSBI representations of hemagglutinin (simulated and experimental). We reveal that these high-dimensional data inherently populate low-dimensional, smooth manifolds, with simulated data effectively covering the experimental counterpart. By characterizing the manifold's geometry using Diffusion Maps and identifying its principal axes of variation via coordinate interpretation methods, we establish a direct link between the latent structure and key physical parameters. Discovering this intrinsic low-dimensionality and interpretable geometric organization not only validates the CryoSBI approach but enables us to learn more from the data structure and provides opportunities for improving future inference strategies by exploiting this revealed manifold geometry.

arXiv.org

Diagnostic Uncertainty Limits the Potential of Early Warning Signals to Identify Epidemic Emergence arxiv.org/abs/2504.11352

Diagnostic Uncertainty Limits the Potential of Early Warning Signals to Identify Epidemic Emergence

Methods to detect the emergence of infectious diseases, and approach to the "critical transition" RE = 1, have to potential to avert substantial disease burden by facilitating preemptive actions like vaccination campaigns. Early warning signals (EWS), summary statistics of infection case time series, show promise in providing such advanced warnings. As EWS are computed on test positive case data, the accuracy of this underlying data is integral to their predictive ability, but will vary with changes in the diagnostic test accuracy and the incidence of the target disease relative to clinically-compatible background noise. We simulated emergent and null time series as the sum of an SEIR-generated measles time series, and background noise generated by either independent draws from a Poisson distribution, or an SEIR simulation with rubella-like parameters. We demonstrate that proactive outbreak detection with EWS metrics is resilient to decreasing diagnostic accuracy, so long as background infections remain proportionally low. Under situations with large, episodic, noise, imperfect diagnostic tests cannot appropriately discriminate between emergent and null periods. Not all EWS metrics performed equally: we find that the mean was the least affected by changes to the noise structure and magnitude, given a moderately accurate diagnostic test (>= to 95% sensitive and specific), and the autocovariance and variance were the most predictive when the noise incidence did not exhibit large temporal variations. In these situations, diagnostic test accuracy should not be a precursor to the implementation of an EWS metric-based alert system.

arXiv.org

Complex multiannual cycles of Mycoplasma pneumoniae: persistence and the role of stochasticity arxiv.org/abs/2504.11402

Complex multiannual cycles of Mycoplasma pneumoniae: persistence and the role of stochasticity

The epidemiological dynamics of Mycoplasma pneumoniae are characterized by complex and poorly understood multiannual cycles, posing challenges for forecasting. Using Bayesian methods to fit a seasonally forced transmission model to long-term surveillance data from Denmark (1958-1995, 2010-2025), we investigate the mechanisms driving recurrent outbreaks of M. pneumoniae. The period of the multiannual cycles (predominantly approx. 5 years in Denmark) are explained as a consequence of the interaction of two time-scales in the system, one intrinsic and one extrinsic (seasonal). While it provides an excellent fit to shorter time series (a few decades), we find that the deterministic model eventually settles into an annual cycle, failing to reproduce the observed 4-5-year periodicity long-term. Upon further analysis, the system is found to exhibit transient chaos and thus high sensitivity to stochasticity. We show that environmental (but not purely demographic) stochasticity can sustain the multi-year cycles via stochastic resonance. The disruptive effects of COVID-19 non-pharmaceutical interventions (NPIs) on M. pneumoniae circulation constitute a natural experiment on the effects of large perturbations. Consequently, the effects of NPIs are included in the model and medium-term predictions are explored. Our findings highlight the intrinsic sensitivity of M. pneumoniae dynamics to perturbations and interventions, underscoring the limitations of deterministic epidemic models for long-term prediction. More generally, our results emphasize the potential role of stochasticity as a driver of complex cycles across endemic and recurring pathogens.

arXiv.org

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation arxiv.org/abs/2504.07117

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

Accurate surgical instrument segmentation is essential in cataract surgery for tasks such as skill assessment and workflow optimization. However, limited annotated data makes it difficult to develop fully automatic models. Prompt-based methods like SAM2 offer flexibility yet remain highly sensitive to the point prompt placement, often leading to inconsistent segmentations. We address this issue by introducing RP-SAM2, which incorporates a novel shift block and a compound loss function to stabilize point prompts. Our approach reduces annotator reliance on precise point positioning while maintaining robust segmentation capabilities. Experiments on the Cataract1k dataset demonstrate that RP-SAM2 improves segmentation accuracy, with a 2% mDSC gain, a 21.36% reduction in mHD95, and decreased variance across random single-point prompt results compared to SAM2. Additionally, on the CaDIS dataset, pseudo masks generated by RP-SAM2 for fine-tuning SAM2's mask decoder outperformed those generated by SAM2. These results highlight RP-SAM2 as a practical, stable and reliable solution for semi-automatic instrument segmentation in data-constrained medical settings. The code is available at https://github.com/BioMedIA-MBZUAI/RP-SAM2.

arXiv.org

PLM-eXplain: Divide and Conquer the Protein Embedding Space arxiv.org/abs/2504.07156

PLM-eXplain: Divide and Conquer the Protein Embedding Space

Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.

arXiv.org

Evaluation of optimal cut-offs and dichotomous combinations for two biomarkers to improve patient selection arxiv.org/abs/2504.07159

Evaluation of optimal cut-offs and dichotomous combinations for two biomarkers to improve patient selection

Background Identifying the right cut-off for continuous biomarkers in clinical trials is important to identify subgroups of patients who are at greater risk of disease or more likely to benefit from a drug. The literature in this area tends to focus on finding cut-offs for a single biomarker, whereas clinical trials more often focus on multiple biomarkers. Methods Our first objective was to compare three methods,Youden index, point closest to the (0,1) corner on the receiving operator characteristic curve (ER), and concordance probability, to find the optimal cut-offs for two biomarkers, using empirical and non-empirical approaches. Our second and main objective was to use our proposed logic indicator approach to extend the Youden index and evaluate whether a combination of biomarkers is an improvement over a single biomarker. Results The logic indicator approach created a condition in which either both biomarkers were positive or only one of the biomarkers was positive. A prostate cancer study and a simulated phase 2 lung cancer study were used to illustrate approaches to finding optimal cut-offs and comparing combined biomarkers with single biomarkers. Conclusion Our results can aid in determining whether a single biomarker or a combination of biomarkers is superior in identifying patients who are more likely to respond to treatment. This work can be of great importance in the era of personalized medicine, where many treatments do not provide clinical benefit to average patients.

arXiv.org

Emergent kinetics of in vitro transcription from interactions of T7 RNA polymerase and DNA arxiv.org/abs/2504.07212

Emergent kinetics of in vitro transcription from interactions of T7 RNA polymerase and DNA

The in vitro transcription reaction (IVT) is of growing importance for the manufacture of RNA vaccines and therapeutics. While the kinetics of the microscopic steps of this reaction (promoter binding, initiation, and elongation) are well studied, the rate law of overall RNA synthesis that emerges from this system is unclear. In this work, we show that a model that incorporates both initiation and elongation steps is essential for describing trends in IVT kinetics in conditions relevant to RNA manufacturing. In contrast to previous reports, we find that the IVT reaction can be either initiation- or elongation-limited depending on solution conditions. This initiation-elongation model is also essential for describing the effect of salts, which disrupt polymerase-promoter binding, on transcription rates. Polymerase-polymerase interactions during elongation are incorporated into our modeling framework and found to have nonzero but unidentifiable effects on macroscopic transcription rates. Finally, we develop an extension of our modeling approach to quantitatively describe and experimentally evaluate RNA- and DNA-templated mechanisms for the formation of double-stranded RNA (dsRNA) impurities. We show experimental results that indicate that an RNA-templated mechanism is not appropriate for describing macroscopic dsRNA formation in the context of RNA manufacturing.

arXiv.org

Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology arxiv.org/abs/2504.07379

Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology

Physics-Informed Kolmogorov-Arnold Networks (PIKANs) are gaining attention as an effective counterpart to the original multilayer perceptron-based Physics-Informed Neural Networks (PINNs). Both representation models can address inverse problems and facilitate gray-box system identification. However, a comprehensive understanding of their performance in terms of accuracy and speed remains underexplored. In particular, we introduce a modified PIKAN architecture, tanh-cPIKAN, which is based on Chebyshev polynomials for parametrization of the univariate functions with an extra nonlinearity for enhanced performance. We then present a systematic investigation of how choices of the optimizer, representation, and training configuration influence the performance of PINNs and PIKANs in the context of systems pharmacology modeling. We benchmark a wide range of first-order, second-order, and hybrid optimizers, including various learning rate schedulers. We use the new Optax library to identify the most effective combinations for learning gray-boxes under ill-posed, non-unique, and data-sparse conditions. We examine the influence of model architecture (MLP vs. KAN), numerical precision (single vs. double), the need for warm-up phases for second-order methods, and sensitivity to the initial learning rate. We also assess the optimizer scalability for larger models and analyze the trade-offs introduced by JAX in terms of computational efficiency and numerical accuracy. Using two representative systems pharmacology case studies - a pharmacokinetics model and a chemotherapy drug-response model - we offer practical guidance on selecting optimizers and representation models/architectures for robust and efficient gray-box discovery. Our findings provide actionable insights for improving the training of physics-informed networks in biomedical applications and beyond.

arXiv.org

Convergence-divergence models: Generalizations of phylogenetic trees modeling gene flow over time arxiv.org/abs/2504.07384

Convergence-divergence models: Generalizations of phylogenetic trees modeling gene flow over time

Phylogenetic trees are simple models of evolutionary processes. They describe conditionally independent divergent evolution of taxa from common ancestors. Phylogenetic trees commonly do not have enough flexibility to adequately model all evolutionary processes. For example, introgressive hybridization, where genes can flow from one taxon to another. Phylogenetic networks model evolution not fully described by a phylogenetic tree. However, many phylogenetic network models assume ancestral taxa merge instantaneously to form ``hybrid'' descendant taxa. In contrast, our convergence-divergence models retain a single underlying ``principal'' tree, but permit gene flow over arbitrary time frames. Alternatively, convergence-divergence models can describe other biological processes leading to taxa becoming more similar over a time frame, such as replicated evolution. Here we present novel maximum likelihood-based algorithms to infer most aspects of $N$-taxon convergence-divergence models, many consistently, using a quartet-based approach. The algorithms can be applied to multiple sequence alignments restricted to genes or genomic windows or to gene presence/absence datasets.

arXiv.org

From empirical brain networks towards modeling music perception -- a perspective arxiv.org/abs/2504.07721

From empirical brain networks towards modeling music perception -- a perspective

This perspective article investigates how auditory stimuli influence neural network dynamics using the FitzHugh-Nagumo (FHN) model and empirical brain connectivity data. Results show that synchronization is sensitive to both the frequency and amplitude of auditory input, with synchronization enhanced when input frequencies align with the system's intrinsic frequencies. Increased stimulus amplitude broadens the synchronization range governed by a delicate interplay involving the network's topology, the spatial location of the input, and the frequency characteristics of the cortical input signals. This perspective article also reveals that brain activity alternates between synchronized and desynchronized states, reflecting critical dynamics and phase transitions in neural networks. Notably, gamma-band synchronization is crucial for processing music, with coherence peaking in this frequency range. The findings emphasize the role of structural connectivity and network topology in modulating synchronization, providing insights into how music perception engages brain networks. This perspective article offers a computational framework for understanding neural mechanisms in music perception, with potential implications for cognitive neuroscience and music psychology.

arXiv.org

Go Figure: Transparency in neuroscience images preserves context and clarifies interpretation arxiv.org/abs/2504.07824

Go Figure: Transparency in neuroscience images preserves context and clarifies interpretation

Visualizations are vital for communicating scientific results. Historically, neuroimaging figures have only depicted regions that surpass a given statistical threshold. This practice substantially biases interpretation of the results and subsequent meta-analyses, particularly towards non-reproducibility. Here we advocate for a "transparent thresholding" approach that not only highlights statistically significant regions but also includes subthreshold locations, which provide key experimental context. This balances the dual needs of distilling modeling results and enabling informed interpretations for modern neuroimaging. We present four examples that demonstrate the many benefits of transparent thresholding, including: removing ambiguity, decreasing hypersensitivity to non-physiological features, catching potential artifacts, improving cross-study comparisons, reducing non-reproducibility biases, and clarifying interpretations. We also demonstrate the many software packages that implement transparent thresholding, several of which were added or streamlined recently as part of this work. A point-counterpoint discussion addresses issues with thresholding raised in real conversations with researchers in the field. We hope that by showing how transparent thresholding can drastically improve the interpretation (and reproducibility) of neuroimaging findings, more researchers will adopt this method.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.