An Integrated Genomics Workflow Tool: Simulating Reads, Evaluating Read Alignments, and Optimizing Variant Calling Algorithms arxiv.org/abs/2504.17860

An Integrated Genomics Workflow Tool: Simulating Reads, Evaluating Read Alignments, and Optimizing Variant Calling Algorithms

Next-generation sequencing (NGS) is a pivotal technique in genome sequencing due to its high throughput, rapid results, cost-effectiveness, and enhanced accuracy. Its significance extends across various domains, playing a crucial role in identifying genetic variations and exploring genomic complexity. NGS finds applications in diverse fields such as clinical genomics, comparative genomics, functional genomics, and metagenomics, contributing substantially to advancements in research, medicine, and scientific disciplines. Within the sphere of genomics data science, the execution of read simulation, mapping, and variant calling holds paramount importance for obtaining precise and dependable results. Given the plethora of tools available for these purposes, each employing distinct methodologies and options, a nuanced understanding of their intricacies becomes imperative for optimization. This research, situated at the intersection of data science and genomics, involves a meticulous assessment of various tools, elucidating their individual strengths and weaknesses through rigorous experimentation and analysis. This comprehensive evaluation has enabled the researchers to pinpoint the most accurate tools, reinforcing the alignment between the established workflow and the demonstrated efficacy of specific tools in the context of genomics data analysis. To meet these requirements, "VarFind", an open-source and freely accessible pipeline tool designed to automate the entire process has been introduced (VarFind GitHub repository: https://github.com/shanikawm/varfinder)

arXiv.org

Seizure duration is associated with multiple timescales in interictal iEEG band power arxiv.org/abs/2504.17888

Seizure duration is associated with multiple timescales in interictal iEEG band power

Background Seizure severity can change from one seizure to the next within individual people with epilepsy. It is unclear if and how seizure severity is modulated over longer timescales. Characterising seizure severity variability over time could lead to tailored treatments. In this study, we test if continuously-recorded interictal intracranial EEG (iEEG) features encapsulate signatures of such modulations. Methods We analysed 20 subjects with iEEG recordings of at least one day. We identified cycles on timescales of hours to days embedded in long-term iEEG band power and associated them with seizure severity, which we approximated using seizure duration. In order to quantify these associations, we created linear-circular statistical models of seizure duration that incorporated different band power cycles within each subject. Findings In most subjects, seizure duration was weakly to moderately correlated with individual band power cycles. Combinations of multiple band power cycles significantly explained most of the variability in seizure duration. Specifically, we found 70% of the models had a higher than 60% adjusted $R^2$ across all subjects. From these models, around 80% were deemed to be above chance-level (p-value < 0.05) based on permutation tests. Models included cycles of ultradian, circadian and slower timescales in a subject-specific manner. Interpretation These results suggest that seizure severity, as measured by seizure duration, may be modulated over timescales of minutes to days by subject-specific cycles in interictal iEEG signal properties. These cycles likely serve as markers of seizure modulating processes. Future work can investigate biological drivers of these detected fluctuations and may inform novel treatment strategies that minimise seizure severity.

arXiv.org

A computational model of infant sensorimotor exploration in the mobile paradigm arxiv.org/abs/2504.17939

A computational model of infant sensorimotor exploration in the mobile paradigm

We present a computational model of the mechanisms that may determine infants' behavior in the "mobile paradigm". This paradigm has been used in developmental psychology to explore how infants learn the sensory effects of their actions. In this paradigm, a mobile (an articulated and movable object hanging above an infant's crib) is connected to one of the infant's limbs, prompting the infant to preferentially move that "connected" limb. This ability to detect a "sensorimotor contingency" is considered to be a foundational cognitive ability in development. To understand how infants learn sensorimotor contingencies, we built a model that attempts to replicate infant behavior. Our model incorporates a neural network, action-outcome prediction, exploration, motor noise, preferred activity level, and biologically-inspired motor control. We find that simulations with our model replicate the classic findings in the literature showing preferential movement of the connected limb. An interesting observation is that the model sometimes exhibits a burst of movement after the mobile is disconnected, casting light on a similar occasional finding in infants. In addition to these general findings, the simulations also replicate data from two recent more detailed studies using a connection with the mobile that was either gradual or all-or-none. A series of ablation studies further shows that the inclusion of mechanisms of action-outcome prediction, exploration, motor noise, and biologically-inspired motor control was essential for the model to correctly replicate infant behavior. This suggests that these components are also involved in infants' sensorimotor learning.

arXiv.org

Modular integration of neural connectomics, dynamics and biomechanics for identification of behavioral sensorimotor pathways in Caenorhabditis elegans arxiv.org/abs/2504.18073

3plex Web: An Interactive Platform for RNA:DNA Triplex Prediction and Analysis arxiv.org/abs/2504.18076

3plex Web: An Interactive Platform for RNA:DNA Triplex Prediction and Analysis

Summary: Long non-coding RNAs (lncRNAs) exert their functions by cooperating with other molecules including proteins and DNA. Triplexes, formed through the interaction between a single-stranded RNA (ssRNA) and a double-stranded DNA (dsDNA), have been consistently described as a mechanism that allows lncRNAs to target specific genomic sequences in vivo. Building on the computational tool 3plex, we developed 3plex Web, an accessible platform that enhances RNA:DNA triplex prediction by integrating interactive visualization, statistical evaluation, and user-friendly downstream analysis workflows. 3plex Web implements new features such as input randomization for statistical assessments, interactive profile plotting for triplex stability, and customizable DNA Binding Domain (DBD) selection. This platform enables rapid analysis through PATO, substantially reducing processing times compared to previous methods, while offering Snakemake workflows to integrate gene expression data and explore lncRNA regulatory mechanisms. Availability and implementation: 3plex Web is freely available at https://3plex.unito.it as an online web service. The source code for 3plex is available at https://github.com/molinerisLab/3plex, paired with a definition file to set up the application into a Singularity image. Contact: ivan.molineris@unito.it Keywords: DNA; RNA; RNA-DNA interaction; triplex; long non-coding RNA; lncRNA; gene regulation; web application

arXiv.org

TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging arxiv.org/abs/2504.18495

TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging

Motivation: Understanding the spatial architecture of tissues is essential for decoding the complex interactions within cellular ecosystems and their implications for disease pathology and clinical outcomes. Recent advances in multiplex imaging technologies have enabled high-resolution profiling of cellular phenotypes and their spatial distributions, revealing critical roles of tissue structures such as tertiary lymphoid structures (TLSs) in shaping immune responses and influencing disease progression. However, existing methods for analyzing spatial tissue structures often rely on hard clustering or adjacency-based spatial models, which are limited in capturing the nuanced and overlapping nature of cellular communities. To address these challenges, we develop a novel spatial topic modeling framework for the unsupervised discovery of spatial tissue structures in multiplex imaging data. Results: We propose TopSpace, a novel Bayesian spatial topic model that integrates Gaussian processes into latent Dirichlet allocation to flexibly model spatial dependencies in tissue microenvironments. By leveraging the Bayesian framework, TopSpace supports multicellular mixed-membership clustering and offers key inferential advantages, including robust uncertainty quantification and data-driven determination of the number of multicellular microenvironments. We demonstrate the utility of TopSpace through simulations and a case study on non-small cell lung cancer (NSCLC) data. Simulations show that TopSpace accurately recovers latent tissue microenvironments and spatial clustering patterns, outperforming existing methods in scenarios with varying spatial dependencies. Applied to NSCLC data, TopSpace successfully identifies TLS and captures their spatial probability distribution, which strongly correlates with patient survival outcomes.

arXiv.org

Enhanced Sampling, Public Dataset and Generative Model for Drug-Protein Dissociation Dynamics arxiv.org/abs/2504.18367 .comp-ph .chem-ph .LG

Enhanced Sampling, Public Dataset and Generative Model for Drug-Protein Dissociation Dynamics

Drug-protein binding and dissociation dynamics are fundamental to understanding molecular interactions in biological systems. While many tools for drug-protein interaction studies have emerged, especially artificial intelligence (AI)-based generative models, predictive tools on binding/dissociation kinetics and dynamics are still limited. We propose a novel research paradigm that combines molecular dynamics (MD) simulations, enhanced sampling, and AI generative models to address this issue. We propose an enhanced sampling strategy to efficiently implement the drug-protein dissociation process in MD simulations and estimate the free energy surface (FES). We constructed a program pipeline of MD simulations based on this sampling strategy, thus generating a dataset including 26,612 drug-protein dissociation trajectories containing about 13 million frames. We named this dissociation dynamics dataset DD-13M and used it to train a deep equivariant generative model UnbindingFlow, which can generate collision-free dissociation trajectories. The DD-13M database and UnbindingFlow model represent a significant advancement in computational structural biology, and we anticipate its broad applicability in machine learning studies of drug-protein interactions. Our ongoing efforts focus on expanding this methodology to encompass a broader spectrum of drug-protein complexes and exploring novel applications in pathway prediction.

arXiv.org

Last-layer committee machines for uncertainty estimations of benthic imagery arxiv.org/abs/2504.16952

Last-layer committee machines for uncertainty estimations of benthic imagery

Automating the annotation of benthic imagery (i.e., images of the seafloor and its associated organisms, habitats, and geological features) is critical for monitoring rapidly changing ocean ecosystems. Deep learning approaches have succeeded in this purpose; however, consistent annotation remains challenging due to ambiguous seafloor images, potential inter-user annotation disagreements, and out-of-distribution samples. Marine scientists implementing deep learning models often obtain predictions based on one-hot representations trained using a cross-entropy loss objective with softmax normalization, resulting with a single set of model parameters. While efficient, this approach may lead to overconfident predictions for context-challenging datasets, raising reliability concerns that present risks for downstream tasks such as benthic habitat mapping and marine spatial planning. In this study, we investigated classification uncertainty as a tool to improve the labeling of benthic habitat imagery. We developed a framework for two challenging sub-datasets of the recently publicly available BenthicNet dataset using Bayesian neural networks, Monte Carlo dropout inference sampling, and a proposed single last-layer committee machine. This approach resulted with a > 95% reduction of network parameters to obtain per-sample uncertainties while obtaining near-identical performance compared to computationally more expensive strategies such as Bayesian neural networks, Monte Carlo dropout, and deep ensembles. The method proposed in this research provides a strategy for obtaining prioritized lists of uncertain samples for human-in-the-loop interventions to identify ambiguous, mislabeled, out-of-distribution, and/or difficult images for enhancing existing annotation tools for benthic mapping and other applications.

arXiv.org

Deep Multi-modal Breast Cancer Detection Network arxiv.org/abs/2504.16954

Deep Multi-modal Breast Cancer Detection Network

Automated breast cancer detection via computer vision techniques is challenging due to the complex nature of breast tissue, the subtle appearance of cancerous lesions, and variations in breast density. Mainstream techniques primarily focus on visual cues, overlooking complementary patient-specific textual features that are equally important and can enhance diagnostic accuracy. To address this gap, we introduce Multi-modal Cancer Detection Network (MMDCNet) that integrates visual cues with clinical data to improve breast cancer detection. Our approach processes medical images using computer vision techniques while structured patient metadata patterns are learned through a custom fully connected network. The extracted features are fused to form a comprehensive representation, allowing the model to leverage both visual and clinical information. The final classifier is trained based on the joint features embedding space of visual and clinical cues and experiments prove enhanced performance, improving accuracy from 79.38\% to 90.87\% on a Mini-DDSM dataset. Additionally, our approach achieves 97.05\% accuracy on an image-only dataset, highlighting the robustness and effectiveness of visual feature extraction. These findings emphasise the potential of multi-modal learning in medical diagnostics, paving the way for future research on optimising data integration strategies and refining AI-driven clinical decision support systems.

arXiv.org

Automating tumor-infiltrating lymphocyte assessment in breast cancer histopathology images using QuPath: a transparent and accessible machine learning pipeline arxiv.org/abs/2504.16979

Automating tumor-infiltrating lymphocyte assessment in breast cancer histopathology images using QuPath: a transparent and accessible machine learning pipeline

In this study, we built an end-to-end tumor-infiltrating lymphocytes (TILs) assessment pipeline within QuPath, demonstrating the potential of easily accessible tools to perform complex tasks in a fully automatic fashion. First, we trained a pixel classifier to segment tumor, tumor-associated stroma, and other tissue compartments in breast cancer H&E-stained whole-slide images (WSI) to isolate tumor-associated stroma for subsequent analysis. Next, we applied a pre-trained StarDist deep learning model in QuPath for cell detection and used the extracted cell features to train a binary classifier distinguishing TILs from other cells. To evaluate our TILs assessment pipeline, we calculated the TIL density in each WSI and categorized them as low, medium, or high TIL levels. Our pipeline was evaluated against pathologist-assigned TIL scores, achieving a Cohen's kappa of 0.71 on the external test set, corroborating previous research findings. These results confirm that existing software can offer a practical solution for the assessment of TILs in H&E-stained WSIs of breast cancer.

arXiv.org

Optimizing chemoradiotherapy for malignant gliomas: a validated mathematical approach arxiv.org/abs/2504.17481

Optimizing chemoradiotherapy for malignant gliomas: a validated mathematical approach

Malignant gliomas (MGs), particularly glioblastoma, are among the most aggressive brain tumors, with limited treatment options and a poor prognosis. Maximal safe resection and the so-called Stupp protocol are the standard first-line therapies. Despite combining radiotherapy and chemotherapy in an intensive manner, it provides limited survival benefits over radiation therapy alone, underscoring the need for innovative therapeutic strategies. Emerging evidence suggests that alternative dosing schedules, such as less aggressive regimens with extended intervals between consecutive treatment applications, may improve outcomes, enhancing survival, delaying the emergence of resistance, and minimizing side effects. In this study, we develop, calibrate, and validate in animal models a novel ordinary differential equation-based mathematical model, using in vivo data to describe MG dynamics under combined chemoradiotherapy. The proposed model incorporates key biological processes, including cancer cell dormancy, phenotypic switching, drug resistance through persister cells, and treatment-induced effects. Through in silico trials, we identified optimized combination treatment protocols that may outperform the standard Stupp protocol. Finally, we computationally extrapolated the results obtained from the in vivo animal model to humans, showing up to a four-fold increase in median survival with protracted administration protocols in silico. Although further experimental and clinical validation is required, our framework provides a computational foundation to optimize and personalize treatment strategies for MG and potentially other cancers with similar biological mechanisms.

arXiv.org

On the robustness of the emergent spatiotemporal dynamics in biophysically realistic and phenomenological whole-brain models at multiple network resolutions arxiv.org/abs/2504.17491

On the robustness of the emergent spatiotemporal dynamics in biophysically realistic and phenomenological whole-brain models at multiple network resolutions

The human brain is a complex dynamical system which displays a wide range of macroscopic and mesoscopic patterns of neural activity, whose mechanistic origin remains poorly understood. Whole-brain modelling allows us to explore candidate mechanisms causing the observed patterns. However, it is not fully established how the choice of model type and the networks' resolution influence the simulation results, hence, it remains unclear, to which extent conclusions drawn from these results are limited by modelling artefacts. Here, we compare the dynamics of a biophysically realistic, linear-nonlinear cascade model of whole-brain activity with a phenomenological Wilson-Cowan model using three structural connectomes based on the Schaefer parcellation scheme with 100, 200, and 500 nodes. Both neural mass models implement the same mechanistic hypotheses, which specifically address the interaction between excitation, inhibition, and a slow adaptation current, which affects the excitatory populations. We quantify the emerging dynamical states in detail and investigate how consistent results are across the different model variants. Then we apply both model types to the specific phenomenon of slow oscillations, which are a prevalent brain rhythm during deep sleep. We investigate the consistency of model predictions when exploring specific mechanistic hypotheses about the effects of both short- and long-range connections and of the antero-posterior structural connectivity gradient on key properties of these oscillations. Overall, our results demonstrate that the coarse-grained dynamics are robust to changes in both model type and network resolution. In some cases, however, model predictions do not generalize. Thus, some care must be taken when interpreting model results.

arXiv.org

Deciphering the unique dynamic activation pathway in a G protein-coupled receptor enables unveiling biased signaling and identifying cryptic allosteric sites in conformational intermediates arxiv.org/abs/2504.17624

Deciphering the unique dynamic activation pathway in a G protein-coupled receptor enables unveiling biased signaling and identifying cryptic allosteric sites in conformational intermediates

Neurotensin receptor 1 (NTSR1), a member of the Class A G protein-coupled receptor superfamily, plays an important role in modulating dopaminergic neuronal activity and eliciting opioid-independent analgesia. Recent studies suggest that promoting \{beta}-arrestin-biased signaling in NTSR1 may diminish drugs of abuse, such as psychostimulants, thereby offering a potential avenue for treating human addiction-related disorders. In this study, we utilized a novel computational and experimental approach that combined nudged elastic band-based molecular dynamics simulations, Markov state models, temporal communication network analysis, site-directed mutagenesis, and conformational biosensors, to explore the intricate mechanisms underlying NTSR1 activation and biased signaling. Our study reveals a dynamic stepwise transition mechanism and activated transmission network associated with NTSR1 activation. It also yields valuable insights into the complex interplay between the unique polar network, non-conserved ion locks, and aromatic clusters in NTSR1 signaling. Moreover, we identified a cryptic allosteric site located in the intracellular region of the receptor that exists in an intermediate state within the activation pathway. Collectively, these findings contribute to a more profound understanding of NTSR1 activation and biased signaling at the atomic level, thereby providing a potential strategy for the development of NTSR1 allosteric modulators in the realm of G protein-coupled receptor biology, biophysics, and medicine.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.