Show newer

Computational Methods for Breast Cancer Molecular Profiling through Routine Histopathology: A Review arxiv.org/abs/2412.10392

Computational Methods for Breast Cancer Molecular Profiling through Routine Histopathology: A Review

Precision medicine has become a central focus in breast cancer management, advancing beyond conventional methods to deliver more precise and individualized therapies. Traditionally, histopathology images have been used primarily for diagnostic purposes; however, they are now recognized for their potential in molecular profiling, which provides deeper insights into cancer prognosis and treatment response. Recent advancements in artificial intelligence (AI) have enabled digital pathology to analyze histopathologic images for both targeted molecular and broader omic biomarkers, marking a pivotal step in personalized cancer care. These technologies offer the capability to extract various biomarkers such as genomic, transcriptomic, proteomic, and metabolomic markers directly from the routine hematoxylin and eosin (H&E) stained images, which can support treatment decisions without the need for costly molecular assays. In this work, we provide a comprehensive review of AI-driven techniques for biomarker detection, with a focus on diverse omic biomarkers that allow novel biomarker discovery. Additionally, we analyze the major challenges faced in this field for robust algorithm development. These challenges highlight areas where further research is essential to bridge the gap between AI research and clinical application.

arXiv.org

Quasispecies dynamics with time lags and periodic fluctuations in replication arxiv.org/abs/2412.10475

Quasispecies dynamics with time lags and periodic fluctuations in replication

Quasispecies theory provides the conceptual and theoretical bases for describing the dynamics of biological information of replicators subject to large mutation rates. This theory, initially conceived within the framework of prebiotic evolution, is also being used to investigate the evolutionary dynamics of RNA viruses and heterogeneous cancer cells populations. In this sense, efforts to approximate the initial quasispecies theory to more realistic scenarios have been made in recent decades. Despite this, how time lags in RNA synthesis and periodic fluctuations impact quasispecies dynamics remains poorly studied. In this article, we combine the theory of delayed ordinary differential equations and topological Leray-Schauder degree to investigate the classical quasispecies model in the single-peak fitness landscape considering time lags and periodic fluctuations in replication. First, we prove that the dynamics with time lags under the constant population constraint remains in the simplex in both forward and backward times. With backward mutation and periodic fluctuations, we prove the existence of periodic orbits regardless of time lags. Nevertheless, without backward mutation, neither periodic fluctuation nor the introduction of time lags leads to periodic orbits. However, in the case of periodic fluctuations, solutions converge exponentially to a periodic oscillation around the equilibria associated with a constant replication rate. We check the validity of the error catastrophe hypothesis assuming no backward mutation; we determine that the error threshold remains sound for the case of time of periodic fitness and time lags with constant fitness. Finally, our results show that the error threshold is not found with backward mutations.

arXiv.org

Predictive Pattern Recognition Techniques Towards Spatiotemporal Representation of Plant Growth in Simulated and Controlled Environments: A Comprehensive Review arxiv.org/abs/2412.10538

Predictive Pattern Recognition Techniques Towards Spatiotemporal Representation of Plant Growth in Simulated and Controlled Environments: A Comprehensive Review

Accurate predictions and representations of plant growth patterns in simulated and controlled environments are important for addressing various challenges in plant phenomics research. This review explores various works on state-of-the-art predictive pattern recognition techniques, focusing on the spatiotemporal modeling of plant traits and the integration of dynamic environmental interactions. We provide a comprehensive examination of deterministic, probabilistic, and generative modeling approaches, emphasizing their applications in high-throughput phenotyping and simulation-based plant growth forecasting. Key topics include regressions and neural network-based representation models for the task of forecasting, limitations of existing experiment-based deterministic approaches, and the need for dynamic frameworks that incorporate uncertainty and evolving environmental feedback. This review surveys advances in 2D and 3D structured data representations through functional-structural plant models and conditional generative models. We offer a perspective on opportunities for future works, emphasizing the integration of domain-specific knowledge to data-driven methods, improvements to available datasets, and the implementation of these techniques toward real-world applications.

arXiv.org

Asymmetric Interactions Shape Survival During Population Range Expansions arxiv.org/abs/2412.10937

Asymmetric Interactions Shape Survival During Population Range Expansions

An organism that is newly introduced into an existing population has a survival probability that is dependent on both the population density of its environment and the competition it experiences with the members of that population. Expanding populations naturally form regions of high and low density, and simultaneously experience ecological interactions both internally and at the boundary of their range. For this reason, systems of expanding populations are ideal for studying the combination of density and ecological effects. Conservation ecologists have been studying the ability of an invasive species to establish for some time, attributing success to both ecological and spatial factors. Similar behaviors have been observed in spatially structured cell populations, such as those found in cancerous tumors and bacterial biofilms. In these scenarios, novel organisms may be the introduction of a new mutation or bacterial species with some form of drug resistance, leading to the possibility of treatment failure. In order to gain insight into the relationship between population density and ecological interactions, we study an expanding population of interacting wild-type cells and mutant cells. We simulate these interactions in time and study the spatially dependent probability for a mutant to survive or to take over the front of the population wave (gene surfing). Additionally, we develop a mathematical model that describes this survival probability and find agreement when the payoff for the mutant is positive (corresponding to cooperation, exploitation, or commensalism). By knowing the types of interactions, our model provides insight into the spatial distribution of survival probability. Conversely, given a spatial distribution of survival probabilities, our model provides insight into the types of interactions that were involved to generate it.

arXiv.org

Decoding Drug Discovery: Exploring A-to-Z In silico Methods for Beginners arxiv.org/abs/2412.11137

Decoding Drug Discovery: Exploring A-to-Z In silico Methods for Beginners

The drug development process is a critical challenge in the pharmaceutical industry due to its time-consuming nature and the need to discover new drug potentials to address various ailments. The initial step in drug development, drug target identification, often consumes considerable time. While valid, traditional methods such as in vivo and in vitro approaches are limited in their ability to analyze vast amounts of data efficiently, leading to wasteful outcomes. To expedite and streamline drug development, an increasing reliance on computer-aided drug design (CADD) approaches has merged. These sophisticated in silico methods offer a promising avenue for efficiently identifying viable drug candidates, thus providing pharmaceutical firms with significant opportunities to uncover new prospective drug targets. The main goal of this work is to review in silico methods used in the drug development process with a focus on identifying therapeutic targets linked to specific diseases at the genetic or protein level. This article thoroughly discusses A-to-Z in silico techniques, which are essential for identifying the targets of bioactive compounds and their potential therapeutic effects. This review intends to improve drug discovery processes by illuminating the state of these cutting-edge approaches, thereby maximizing the effectiveness and duration of clinical trials for novel drug target investigation.

arXiv.org

Applications of Knot Theory for the Improvement of the AlphaFold Protein Database arxiv.org/abs/2412.11229

Applications of Knot Theory for the Improvement of the AlphaFold Protein Database

AlphaFold, a groundbreaking protein prediction model, has revolutionized protein structure prediction, populating the AlphaFold Protein Database (AFDB) with millions of predicted structures. However, AlphaFold's accuracy in predicting proteins with intricate topologies, such as knots, remains a concern. This study investigates AlphaFold's performance in predicting knotted proteins and explores potential solutions to enhance the AFDB's reliability. Forty-five experimentally verified knotted protein structures from the KnotProt database were compared to their AlphaFold-generated counterparts. Knot analysis was performed using PyKnot, a PyMOL plugin, employing both Gauss codes and Alexander-Briggs knot notations. Results showed 95.6% accuracy in predicting the general shape of knots using Alexander-Briggs notation. However, Gauss code analysis revealed a 55.6% discrepancy, indicating AlphaFold's limitations in accurately representing the intricate orientation and directionality of knots. This Applications of Knot Theory for the improvement of the AlphaFold Protein Database suggests potential inaccuracies in a significant portion of the AFDB's knotted protein structures. The study underscores the need for improved knot representation in AlphaFold and proposes potential solutions, including transitioning to a single-module design or removing incorrectly predicted structures from the AFDB. These findings highlight the importance of continuous refinement for AI-based protein structure prediction tools to ensure the accuracy and reliability of protein databases for research and drug development.

arXiv.org

On study of transition fronts of Fisher-KPP type reaction-diffusion PDEs by non-linear transformations into exactly solvable class arxiv.org/abs/2412.09653

On study of transition fronts of Fisher-KPP type reaction-diffusion PDEs by non-linear transformations into exactly solvable class

Spatio-temporal dynamics of the evolution of population involving growth and diffusion processes can be modeled by class of partial diffusion equations (PDEs) known as reaction-diffusion systems. In this work, we developed a nonlinear transformations method that converts the original nonlinear Fisher-KPP class of PDEs into an exactly solvable class. We then demonstrated that the proposed nonlinear transformation method intrinsically preserves the relaxation behavior of the solutions to asymptotic values of the non-linear dynamical system. We also show that these particular transforms are very amenable to yield an exact closed form solution in terms of the heat kernel and analytical approximations through the two variable Hermite polynomials. With this proposed method, we calculated the front velocity and shape of the propagating wave and showed how the non-linear transformation affects these parameters for both short and long epochs. As applications, we focus on solving pertinent cases of the Fisher-KPP type of PDEs relating to the evolutionary dynamics by assigning fitness to the mutant gene according to zygosity conditions. We calculated the relaxation of velocity with the parameters of the initial conditions in the following cases, namely, the Fisher, the heterozygote inferior fitness, the heterozygote superior fitness, and finally a general nonlinearity case. We also verified previous conjectures through the exact solutions computed using the proposed method.

arXiv.org

Language model driven: a PROTAC generation pipeline with dual constraints of structure and property arxiv.org/abs/2412.09661

Language model driven: a PROTAC generation pipeline with dual constraints of structure and property

The imperfect modeling of ternary complexes has limited the application of computer-aided drug discovery tools in PROTAC research and development. In this study, an AI-assisted approach for PROTAC molecule design pipeline named LM-PROTAC was developed, which stands for language model driven Proteolysis Targeting Chimera, by embedding a transformer-based generative model with dual constraints on structure and properties, referred to as the DCT. This study utilized the fragmentation representation of molecules and developed a language model driven pipeline. Firstly, a language model driven affinity model for protein compounds to screen molecular fragments with high affinity for the target protein. Secondly, structural and physicochemical properties of these fragments were constrained during the generation process to meet specific scenario requirements. Finally, a two-round screening of the preliminary generated molecules using a multidimensional property prediction model to generate a batch of PROTAC molecules capable of degrading disease-relevant target proteins for validation in vitro experiments, thus achieving a complete solution for AI-assisted PROTAC drug generation. Taking the tumor key target Wnt3a as an example, the LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a. The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.

arXiv.org

Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management arxiv.org/abs/2412.09727

Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management

While previous studies of AI in diabetes management focus on long-term risk, research on near-future glucose prediction remains limited but important as it enables timely diabetes self-management. Integrating AI with continuous glucose monitoring (CGM) holds promise for near-future glucose prediction. However, existing models have limitations in capturing patterns of blood glucose fluctuations and demonstrate poor generalizability. A robust approach is needed to leverage massive CGM data for near-future glucose prediction. We propose large sensor models (LSMs) to capture knowledge in CGM data by modeling patients as sequences of glucose. CGM-LSM is pretrained on 15.96 million glucose records from 592 diabetes patients for near-future glucose prediction. We evaluated CGM-LSM against state-of-the-art methods using the OhioT1DM dataset across various metrics, prediction horizons, and unseen patients. Additionally, we assessed its generalizability across factors like diabetes type, age, gender, and hour of day. CGM-LSM achieved exceptional performance, with an rMSE of 29.81 mg/dL for type 1 diabetes patients and 23.49 mg/dL for type 2 diabetes patients in a two-hour prediction horizon. For the OhioT1DM dataset, CGM-LSM achieved a one-hour rMSE of 15.64 mg/dL, halving the previous best of 31.97 mg/dL. Robustness analyses revealed consistent performance not only for unseen patients and future periods, but also across diabetes type, age, and gender. The model demonstrated adaptability to different hours of day, maintaining accuracy across periods of various activity intensity levels. CGM-LSM represents a transformative step in diabetes management by leveraging pretraining to uncover latent glucose generation patterns in sensor data. Our findings also underscore the broader potential of LSMs to drive innovation across domains involving complex sensor data.

arXiv.org

Nap-induced modulations of tinnitus -a cross-sectional database analysis arxiv.org/abs/2412.09973

Nap-induced modulations of tinnitus -a cross-sectional database analysis

The influence of naps on tinnitus was systematically assessed by exploring the frequency, clinical and demographic characteristics of this phenomenon. 9,724 data from two different tinnitus databases (Tinnitus Hub: $n = 6115$; Tinnitus Research Initiative (TRI): $n = 3627$) were included. After separate analysis of the databases, these results were then compared with each other. In the Tinnitus Hub survey database, a total of 31.1% reported an influence on tinnitus by taking a nap (26.9% in the TRI database), with much more frequent worsening after a nap than improvement (23.0% a little or a lot worse; TRI: 17.7% worse; 8.1% a little or a lot better; TRI: 9.2% better). The influence of napping on tinnitus was associated in both databases with other clinical features, such as the dependence of tinnitus on night quality, stress and somatosensory maneuvers. The present study confirms the clinical observation that more tinnitus sufferers report worsening after a nap than tinnitus sufferers reporting an improvement. It was consistently shown that tinnitus sufferers reporting nap-induced modulation of tinnitus also report more frequently an influence of night sleep on their tinnitus. Further clinical and polysomnographic research is warranted to better understand the interaction between sleep and tinnitus.

arXiv.org

MiCull2 -- simulating mastitis transmission through milking order arxiv.org/abs/2412.10165

MiCull2 -- simulating mastitis transmission through milking order

Contagious mastitis pathogens can be transmitted through milking. However, previously published simulation models, such as MiCull, have not directly taken this into account. We have reimplemented the MiCull model to model transmission of contagious mastitis pathogens through milking in a milking parlor. This additional complexity requires a substantial increase in computations and a need to structure the program code to make it more flexible for future use. The aim of this paper was threefold: First, to implement the new model in a faster programming language; secondly, to describe the new model, in particular transmission of a contagious mastitis pathogen through milking; and thirdly, to compare three different milking order strategies in regards to prevalence and incidence of intramammary infections. For each scenario, 500 herds with 200 cows each were simulated over 10 years. The model was calibrated using available mastitis parameters from the literature. We hypothesized that milking order should have a considerable effect on disease transmission, especially if the infected cows with clinical enter the milking parlor first and thereby have a high risk of infecting the following cows. The milking order scenarios examined were random milking order and milking clinical cases first, or last. Unexpectedly, there were no large differences between these scenarios for reasonably sized infection rates corresponding to a herd with a moderate level of clinical mastitis in the herd. Larger differences are expected to be found in herds with very high infection rates. We have developed a transmission simulation model of mastitis pathogens using a new mode of transmission by milking order. We expect that this new version of MiCull will be useful for both researchers and advisors since it is flexible, can be fitted to various in-herd situations and the computations are fast.

arXiv.org

Quadratic unconstrained binary optimization and constraint programming approaches for lattice-based cyclic peptide docking arxiv.org/abs/2412.10260

Quadratic unconstrained binary optimization and constraint programming approaches for lattice-based cyclic peptide docking

The peptide-protein docking problem is an important problem in structural biology that facilitates rational and efficient drug design. In this work, we explore modeling and solving this problem with the quantum-amenable quadratic unconstrained binary optimization (QUBO) formalism. Our work extends recent efforts by incorporating the objectives and constraints associated with peptide cyclization and peptide-protein docking in the two-particle model on a tetrahedral lattice. We propose a ``resource efficient'' QUBO encoding for this problem, and baseline its performance with a novel constraint programming (CP) approach. We implement an end-to-end framework that enables the evaluation of our methods on instances from the Protein Data Bank (PDB). Our results show that the QUBO approach, using a classical simulated annealing solver, is able to find feasible conformations for problems with up to 6 peptide residues and 34 target protein residues, but has trouble scaling beyond this problem size. In contrast, the CP approach can solve problems with up to 13 peptide residues and 34 target protein residues. We conclude that while QUBO can be used to successfully tackle this problem, its scaling limitations and the strong performance of the CP method suggest that it may not be the best choice.

arXiv.org

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models arxiv.org/abs/2412.10347

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.