Show newer

Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables. (arXiv:2208.05482v1 [q-bio.QM]) arxiv.org/abs/2208.05482

Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables

Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-reactant retrosynthesis as a sequence-to-sequence translation problem. However, most of them struggle to identify diverse chemical reactions for a desired product due to the deterministic inference, which contradicts the fact that many compounds can be synthesized through various reaction types with different sets of reactants. In this work, we aim to increase reaction diversity and generate various reactants using discrete latent variables. We propose a novel sequence-based approach, namely RetroDVCAE, which incorporates conditional variational autoencoders into single-step retrosynthesis and associates discrete latent variables with the generation process. Specifically, RetroDVCAE uses the Gumbel-Softmax distribution to approximate the categorical distribution over potential reactions and generates multiple sets of reactants with the variational decoder. Experiments demonstrate that RetroDVCAE outperforms state-of-the-art baselines on both benchmark dataset and homemade dataset. Both quantitative and qualitative results show that RetroDVCAE can model the multi-modal distribution over reaction types and produce diverse reactant candidates.

arxiv.org

Regular and sparse neuronal synchronization are described by identical mean field dynamics. (arXiv:2208.05515v1 [q-bio.NC]) arxiv.org/abs/2208.05515

Regular and sparse neuronal synchronization are described by identical mean field dynamics

Fast neuronal oscillations (>30~Hz) are very often characterized by a dichotomy between macroscopic and microscopic dynamics. At the macroscopic level oscillations are highly periodic, while individual neurons display very irregular spike discharges at a rate that is low compared to the global oscillation frequency. Theoretical work revealed that this dynamical state robustly emerges in large networks of inhibitory neurons with strong feedback inhibition and significant levels of noise. This so-called `sparse synchronization' is considered to be at odds with the classical theory of collective synchronization of heterogeneous self-sustained oscillators, where synchronized neurons fire regularly. By means of an exact mean field theory for populations of heterogeneous, quadratic integrate-and-fire (QIF) neurons -- that here we extend to include Cauchy noise -- , we show that networks of stochastic QIF neurons showing sparse synchronization are governed by exactly the same mean field equations as deterministic networks displaying regular, collective synchronization. Our results reconcile two traditionally confronted views on neuronal synchronization, and upgrade the applicability of exact mean field theories to describe a broad range of biologically realistic neuronal states.

arxiv.org

Functional Connectivity in Visual Areas from Total Correlation. (arXiv:2208.05770v1 [q-bio.NC]) arxiv.org/abs/2208.05770

Functional Connectivity in Visual Areas from Total Correlation

A recent study invoked the superiority of the Total Correlation concept over the conventional pairwise measures of functional connectivity in neuroscience. That seminal work was restricted to show that empirical measures of Total Correlation lead to connectivity patterns that differ from what is obtained using linear correlation and Mutual Information. However, beyond the obvious multivariate versus bivariate definitions, no theoretical insight on the benefits of Total Correlation was given. The accuracy of the empirical estimators could not be addressed because no controlled scenario with known analytical result was considered either. In this work we analytically illustrate the advantages of Total Correlation to describe the functional connectivity in the visual pathway. Our neural model includes three layers (retina, LGN, and V1 cortex) and one can control the connectivity among the nodes, within the cortex, and the eventual top-down feedback. In this multivariate setting (three nodes with multidimensional signals), we derive analytical results for the three-way Total Correlation and for all possible pairwise Mutual Information measures. These analytical results show that pairwise Mutual Information cannot capture the effect of different intra-cortical inhibitory connections while the three-way Total Correlation can. The presented analytical setting is also useful to check empirical estimators of Total Correlation. Therefore, once certain estimator can be trusted, one can explore the behavior with natural signals where the analytical results (that assume Gaussian signals) are no longer valid. In this regard (a) we explore the effect of connectivity and feedback in the analytical retina-cortex network with natural images, and (b) we assess the functional connectivity in V1-V2-V3-V4 from actual fMRI recordings.

arxiv.org

Family based HLA imputation and optimization of haplo-identical transplants. (arXiv:2208.05882v1 [q-bio.QM]) arxiv.org/abs/2208.05882

Family based HLA imputation and optimization of haplo-identical transplants

Recently, haplo-identical transplantation with multiple HLA mismatches has become a viable option for system cell transplants. Haplotype sharing detection requires imputation of donor and recipient. We show that even in high-resolution typing when all alleles are known, there is a 15% error rate in haplotype phasing, and even more in low resolution typings. Similarly, in related donors, parents haplotypes should be imputed to determine what haplotype each child inherited. We propose GRAMM (GRaph bAsed FaMilly iMputation) to phase alleles in family pedigree HLA typing data, and in mother-cord blood unit pairs. We show that GRAMM has practically no phasing errors when pedigree data are available. We apply GRAMM to simulations with different typing resolutions as well as paired cord-mother typings, and show very high phasing accuracy, and improved alleles imputation accuracy. We use GRAMM to detect recombination events and show that the rate of falsely detected recombination events (False Positive Rate) in simulations is very low. We then apply recombination detection to typed families to estimate the recombination rate in Israeli and Australian population datasets. The estimated recombination rate has an upper bound of 10-20% per family (1-4% per individual). GRAMM is available at: https://gramm.math.biu.ac.il/.

arxiv.org

Generalization of Powell's results to unbalanced population growth. (arXiv:2208.05884v1 [physics.bio-ph]) arxiv.org/abs/2208.05884

Generalization of Powell's results to unbalanced population growth

New experimental methods allow for studying the dynamics of cell populations with increasing precision and time resolution, providing us with a large amount of high-quality data. These data, in turn, stimulate the mathematical modeling of such systems. Here, using a generalization of the McKendrick-von Foerster model proposed by Lebowitz and Rubinow, we derive relationships between the instantaneous population growth rate and probability distributions of cell age and generation time. Such relationships (for example, the Euler-Lotka equation) are known for populations in a steady state of balanced growth, but we generalize them to include unbalanced growth. Some probability distributions of interest are unobservable, yet the present formalism allows us to express them using experimentally observable quantities. Our results remain valid for a class of more complex population-balance models, as these can be reduced to the McKendrick-von Foerster form (by integrating out the variables other than the cell age), and subsequently, they can be analyzed within the framework of the Lebowitz-Rubinow model. We also propose a generalization of the latter model in which cells are described not only by age and generation time but also by volume and a single-cell growth rate.

arxiv.org

Bridging the gap between target-based and cell-based drug discovery with a graph generative multi-task model. (arXiv:2208.04944v1 [q-bio.QM]) arxiv.org/abs/2208.04944

Bridging the gap between target-based and cell-based drug discovery with a graph generative multi-task model

Drug discovery is vitally important for protecting human against disease. Target-based screening is one of the most popular methods to develop new drugs in the past several decades. This method efficiently screens candidate drugs inhibiting target protein in vitro, but it often fails due to inadequate activity of the selected drugs in vivo. Accurate computational methods are needed to bridge this gap. Here, we propose a novel graph multi task deep learning model to identify compounds carrying both target inhibitory and cell active (MATIC) properties. On a carefully curated SARS-CoV-2 dataset, the proposed MATIC model shows advantages comparing with traditional method in screening effective compounds in vivo. Next, we explored the model interpretability and found that the learned features for target inhibition (in vitro) or cell active (in vivo) tasks are different with molecular property correlations and atom functional attentions. Based on these findings, we utilized a monte carlo based reinforcement learning generative model to generate novel multi-property compounds with both in vitro and in vivo efficacy, thus bridging the gap between target-based and cell-based drug discovery.

arxiv.org

Dependence of protein-induced lipid bilayer deformations on protein shape. (arXiv:2208.05011v1 [physics.bio-ph]) arxiv.org/abs/2208.05011

Dependence of protein-induced lipid bilayer deformations on protein shape

Membrane proteins typically deform the surrounding lipid bilayer membrane, which can play an important role in the function, regulation, and organization of membrane proteins. Membrane elasticity theory provides a beautiful description of protein-induced lipid bilayer deformations, in which all physical parameters can be directly determined from experiments. Analytic treatments of the membrane elasticity theory of protein-induced lipid bilayer deformations have largely focused on idealized protein shapes with circular cross section, and on perturbative solutions for proteins with non-circular cross section. We develop here a boundary value method (BVM) that permits the construction of non-perturbative analytic solutions of protein-induced lipid bilayer deformations for non-circular protein cross sections, for constant as well as variable boundary conditions along the bilayer-protein interface. We apply this BVM to protein-induced lipid bilayer thickness deformations. Our BVM reproduces available analytic solutions for proteins with circular cross section and yields, for proteins with non-circular cross section, excellent agreement with numerical, finite element solutions. On this basis, we formulate a simple analytic approximation of the bilayer thickness deformation energy associated with general protein shapes and show that, for modest deviations from rotational symmetry, this analytic approximation is in good agreement with BVM solutions. Using the BVM, we survey the dependence of protein-induced lipid bilayer thickness deformations on protein shape, and thus explore how the coupling of protein shape and bilayer thickness deformations affects protein oligomerization and transitions in protein conformational state.

arxiv.org

Robust Scenario Interpretation from Multi-model Prediction Efforts. (arXiv:2208.05075v1 [stat.ME]) arxiv.org/abs/2208.05075

Robust Scenario Interpretation from Multi-model Prediction Efforts

Multi-model prediction efforts in infectious disease modeling and climate modeling involve multiple teams independently producing projections under various scenarios. Often these scenarios are produced by the presence and absence of a decision in the future, e.g., no vaccinations (scenario A) vs vaccinations (scenario B) available in the future. The models submit probabilistic projections for each of the scenarios. Obtaining a confidence interval on the impact of the decision (e.g., number of deaths averted) is important for decision making. However, obtaining tight bounds only from the probabilistic projections for the individual scenarios is difficult, as the joint probability is not known. Further, the models may not be able to generate the joint probability distribution due to various reasons including the need to rewrite simulations, and storage and transfer requirements. Without asking the submitting models for additional work, we aim to estimate a non-trivial bound on the outcomes due to the decision variable. We first prove, under a key assumption, that an $α-$confidence interval on the difference of scenario predictions can be obtained given only the quantiles of the predictions. Then we show how to estimate a confidence interval after relaxing that assumption. We use our approach to estimate confidence intervals on reduction in cases, deaths, and hospitalizations due to vaccinations based on model submissions to the US Scenario Modeling Hub.

arxiv.org

Quantification of metabolic niche occupancy dynamics in a Baltic Sea bacterial community. (arXiv:2208.05204v1 [q-bio.PE]) arxiv.org/abs/2208.05204

Quantification of metabolic niche occupancy dynamics in a Baltic Sea bacterial community

Progress in molecular methods has enabled the monitoring of bacterial populations in time. Nevertheless, understanding community dynamics and its links with ecosystem functioning remains challenging due to the tremendous diversity of microorganisms. Conceptual frameworks that make sense of time-series of taxonomically-rich bacterial communities, regarding their potential ecological function, are needed. A key concept for organizing ecological functions is the niche, the set of strategies that enable a population to persist and define its impacts on the surroundings. Here we present a framework based on manifold learning, to organize genomic information into potentially occupied bacterial metabolic niches over time. We apply the method to re-construct the dynamics of putatively occupied metabolic niches using a long-term bacterial time-series from the Baltic Sea, the Linnaeus Microbial Observatory (LMO). The results reveal a relatively low-dimensional space of occupied metabolic niches comprising groups of taxa with similar functional capabilities. Time patterns of occupied niches were strongly driven by seasonality. Some metabolic niches were dominated by one bacterial taxon whereas others were occupied by multiple taxa, and this depended on season. These results illustrate the power of manifold learning approaches to advance our understanding of the links between community composition and functioning in microbial systems.

arxiv.org

Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy. (arXiv:2208.05228v1 [q-bio.QM]) arxiv.org/abs/2208.05228

Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy

Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems. Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start date until June, 2022, by use of the terms "monkeypox virus" or "poxvirus" along with "diagnosis"; "PCR"; "real-time PCR"; "LAMP"; "RPA"; "immunoassay"; "reemergence"; "biothreat"; "endemic", and "multi-country outbreak" and also, by tracking citations of the relevant papers. The most relevant articles are included in the review. Results Our literature review shows that PCR is the gold standard method for MPXV detection. In addition, loop-mediated isothermal amplification (LAMP) and recombinase polymerase amplification (RPA) have been reported as alternatives to PCR. Immunodiagnostics, whole particle detection, and image-based detection are the non-nucleic acid-based MPXV detection modalities. Conclusions PCR is easy to leverage and adapt for a quick response to an outbreak, but the PCR-based MPXV detection approaches may not be suitable for marginalized settings. Limited progress has been made towards innovations in MPXV diagnostics, providing room for the development of novel detection techniques for this virus.

arxiv.org

Computational challenges of cell cycle analysis using single cell transcriptomics. (arXiv:2208.05229v1 [q-bio.QM]) arxiv.org/abs/2208.05229

Computational challenges of cell cycle analysis using single cell transcriptomics

The cell cycle is one of the most fundamental biological processes important for understanding normal physiology and various pathologies such as cancer. Single cell RNA sequencing technologies give an opportunity to analyse the cell cycle transcriptome dynamics in an unprecedented range of conditions (cell types and perturbations), with thousands of publicly available datasets. Here we review the main computational tasks in such analysis: 1) identification of cell cycle phases, 2) pseudotime inference, 3) identification and profiling of cell cycle-related genes, 4) removing cell cycle effect, 5) identification and analysis of the G0 (quiescent) cells. We review seventeen software packages that are available today for the cell cycle analysis using scRNA-seq data. Despite huge progress achieved, none of the packages can produce complete and reliable results with respect to all aforementioned tasks. One of the major difficulties for existing packages is distinguishing between two patterns of cell cycle transcriptomic dynamics: normal and characteristic for embryonic stem cells (ESC), with the latter one shared by many cancer cell lines. Moreover, some cell lines are characterized by a mixture of two subpopulations, one following the standard and one ESC-like cell cycle, which makes the analysis even more challenging. In conclusion, we discuss the difficulties of the analysis of cell cycle-related single cell transcriptome and provide certain guidelines for the use of the existing methods.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.