Show newer

DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints. (arXiv:2301.09642v1 [q-bio.QM]) arxiv.org/abs/2301.09642

DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints

Have you ever been troubled by the complexity and computational cost of SE(3) protein structure modeling and been amazed by the simplicity and power of language modeling? Recent work has shown promise in simplifying protein structures as sequences of protein angles; therefore, language models could be used for unconstrained protein backbone generation. Unfortunately, such simplification is unsuitable for the constrained protein inpainting problem, where the model needs to recover masked structures conditioned on unmasked ones, as it dramatically increases the computing cost of geometric constraints. To overcome this dilemma, we suggest inserting a hidden \textbf{a}tomic \textbf{d}irection \textbf{s}pace (\textbf{ADS}) upon the language model, converting invariant backbone angles into equivalent direction vectors and preserving the simplicity, called Seq2Direct encoder ($\text{Enc}_{s2d}$). Geometric constraints could be efficiently imposed on the newly introduced direction space. A Direct2Seq decoder ($\text{Dec}_{d2s}$) with mathematical guarantees is also introduced to develop a \textbf{SDS} ($\text{Enc}_{s2d}$+$\text{Dec}_{d2s}$) model. We apply the SDS model as the denoising neural network during the conditional diffusion process, resulting in a constrained generative model--\textbf{DiffSDS}. Extensive experiments show that the plug-and-play ADS could transform the language model into a strong structural model without loss of simplicity. More importantly, the proposed DiffSDS outperforms previous strong baselines by a large margin on the task of protein inpainting.

arxiv.org

Beyond $\ell_1$ sparse coding in V1. (arXiv:2301.10002v1 [q-bio.NC]) arxiv.org/abs/2301.10002

Beyond $\ell_1$ sparse coding in V1

Growing evidence indicates that only a sparse subset from a pool of sensory neurons is active for the encoding of visual stimuli at any instant in time. Traditionally, to replicate such biological sparsity, generative models have been using the $\ell_1$ norm as a penalty due to its convexity, which makes it amenable to fast and simple algorithmic solvers. In this work, we use biological vision as a test-bed and show that the soft thresholding operation associated to the use of the $\ell_1$ norm is highly suboptimal compared to other functions suited to approximating $\ell_q$ with $0 \leq q < 1 $ (including recently proposed Continuous Exact relaxations), both in terms of performance and in the production of features that are akin to signatures of the primary visual cortex. We show that $\ell_1$ sparsity produces a denser code or employs a pool with more neurons, i.e. has a higher degree of overcompleteness, in order to maintain the same reconstruction error as the other methods considered. For all the penalty functions tested, a subset of the neurons develop orientation selectivity similarly to V1 neurons. When their code is sparse enough, the methods also develop receptive fields with varying functionalities, another signature of V1. Compared to other methods, soft thresholding achieves this level of sparsity at the expense of much degraded reconstruction performance, that more likely than not is not acceptable in biological vision. Our results indicate that V1 uses a sparsity inducing regularization that is closer to the $\ell_0$ pseudo-norm rather than to the $\ell_1$ norm.

arxiv.org

Flow cytometry with anti-diffraction light sheet (ADLS) by spatial light modulation. (arXiv:2301.10185v1 [physics.optics]) arxiv.org/abs/2301.10185

Flow cytometry with anti-diffraction light sheet (ADLS) by spatial light modulation

Flow cytometry is a widespread and powerful technique, whose resolution is determined by its capacity to accurately distinguish fluorescently positive populations from negative ones. However, most informative results are discarded while performing the measurements of conventional flow cytometry, e.g., the cell size, shape, morphology, and distribution or location of labeled exosomes within the unpurified biological samples. We, herein, propose a novel approach using an anti-diffraction light sheet with anisotroic feature to excite fluorescent tags. Constituted by an anti-diffraction Bessel-Gaussian beam array, the light sheet is 12 $μ$m wide, 12 $μ$m high, with a thickness of $~ 0.8 μ$m. The intensity profile of the excited fluorescent signal can, therefore, reflect the size and allow samples in the range from O(100 nm) to 10 $μ$m (e.g., blood cells) to be transported via hydrodynamic focusing in a microfluidic chip. The sampling rate is 500 kHz provides a capability of high throughput without sacrificing the spatial resolution. Consequently, the proposed anti-diffraction light-sheet flow cytometry (ADLSFC) can obtain more informative results than the conventional methodologies, and is able to provide multiple characteristics (e.g., the size and distribution of fluorescent signal) helping to distinguish the target samples from the complex backgrounds.

arxiv.org

Neuronal architecture extracts statistical temporal patterns. (arXiv:2301.10203v1 [q-bio.NC]) arxiv.org/abs/2301.10203

Neuronal architecture extracts statistical temporal patterns

Neuronal systems need to process temporal signals. We here show how higher-order temporal (co-)fluctuations can be employed to represent and process information. Concretely, we demonstrate that a simple biologically inspired feedforward neuronal model is able to extract information from up to the third order cumulant to perform time series classification. This model relies on a weighted linear summation of synaptic inputs followed by a nonlinear gain function. Training both - the synaptic weights and the nonlinear gain function - exposes how the non-linearity allows for the transfer of higher order correlations to the mean, which in turn enables the synergistic use of information encoded in multiple cumulants to maximize the classification accuracy. The approach is demonstrated both on a synthetic and on real world datasets of multivariate time series. Moreover, we show that the biologically inspired architecture makes better use of the number of trainable parameters as compared to a classical machine-learning scheme. Our findings emphasize the benefit of biological neuronal architectures, paired with dedicated learning algorithms, for the processing of information embedded in higher-order statistical cumulants of temporal (co-)fluctuations.

arxiv.org

A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time. (arXiv:2202.09209v4 [q-bio.NC] UPDATED) arxiv.org/abs/2202.09209

A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time

This article presents an overview of a theory for performing temporal smoothing on temporal signals in such a way that: (i) temporally smoothed signals at coarser temporal scales are guaranteed to constitute simplifications of corresponding temporally smoothed signals at any finer temporal scale (including the original signal) and (ii) the temporal smoothing process is both time-causal and time-recursive, in the sense that it does not require access to future information and can be performed with no other temporal memory buffer of the past than the resulting smoothed temporal scale-space representations themselves. For specific subsets of parameter settings for the classes of linear and shift-invariant temporal smoothing operators that obey this property, it is shown how temporal scale covariance can be additionally obtained, guaranteeing that if the temporal input signal is rescaled by a uniform scaling factor, then also the resulting temporal scale-space representations of the rescaled temporal signal will constitute mere rescalings of the temporal scale-space representations of the original input signal, complemented by a shift along the temporal scale dimension. The resulting time-causal limit kernel that obeys this property constitutes a canonical temporal kernel for processing temporal signal in real-time scenarios when the regular Gaussian kernel cannot be used because of its non-causal access to information from the future and we cannot additionally require the temporal smoothing process to comprise a complementary memory of the past beyond the information contained in the temporal smoothing process itself, which in this way also serves as a multi-scale temporal memory of the past. This theory is generally applicable for both: (i) modelling continuous temporal phenomena over multiple temporal scales and (ii) digital processing of measured temporal signals in real time.

arxiv.org

How cells wrap around virus-like particles using extracellular filamentous protein structures. (arXiv:2301.08776v1 [cond-mat.soft]) arxiv.org/abs/2301.08776

How cells wrap around virus-like particles using extracellular filamentous protein structures

Nanoparticles, such as viruses, can enter cells via endocytosis. During endocytosis, the cell surface wraps around the nanoparticle to effectively eat it. Prior focus has been on how nanoparticle size and shape impacts endocytosis. However, inspired by the noted presence of extracellular vimentin affecting viral and bacteria uptake, as well as the structure of coronaviruses, we construct a computational model in which {\it both} the cell-like construct and the virus-like construct contain filamentous protein structures protruding from their surfaces. We then study the impact of these additional degrees of freedom on viral wrapping. We find that cells with an optimal density of filamentous extracellular components (ECCs) are more likely to be infected as they uptake the virus faster and use relatively less cell surface area per individual virus. At the optimal density, the cell surface folds around the virus, and folds are faster and more efficient at wrapping the virus than crumple-like wrapping. We also find that cell surface bending rigidity helps generate folds, as bending rigidity enhances force transmission across the surface. However, changing other mechanical parameters, such as the stretching stiffness of filamentous ECCs or virus spikes, can drive crumple-like formation of the cell surface. We conclude with the implications of our study on the evolutionary pressures of virus-like particles, with a particular focus on the cellular microenvironment that may include filamentous ECCs.

arxiv.org

ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads. (arXiv:2301.08785v1 [q-bio.GN]) arxiv.org/abs/2301.08785

ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink.

arxiv.org

Accurately summarizing an outbreak using epidemiological models takes time. (arXiv:2301.08799v1 [q-bio.PE]) arxiv.org/abs/2301.08799

Accurately summarizing an outbreak using epidemiological models takes time

Recent outbreaks of monkeypox and Ebola, and worrying waves of COVID-19, influenza and respiratory syncytial virus, have all led to a sharp increase in the use of epidemiological models to estimate key epidemiological parameters. The feasibility of this estimation task is known as the practical identifiability (PI) problem. Here, we investigate the PI of eight commonly reported statistics of the classic Susceptible-Infectious-Recovered model using a new measure that shows how much a researcher can expect to learn in a model-based Bayesian analysis of prevalence data. Our findings show that the basic reproductive number and final outbreak size are often poorly identified, with learning exceeding that of individual model parameters only in the early stages of an outbreak. The peak intensity, peak timing, and initial growth rate are better identified, being in expectation over 20 times more probable having seen the data by the time the underlying outbreak peaks. We then test PI for a variety of true parameter combinations, and find that PI is especially problematic in slow-growing or less-severe outbreaks. These results add to the growing body of literature questioning the reliability of inferences from epidemiological models when limited data are available.

arxiv.org

Are physiological oscillations physiological?. (arXiv:2301.08996v1 [q-bio.TO]) arxiv.org/abs/2301.08996

Are physiological oscillations physiological?

Despite widespread and striking examples of physiological oscillations, their functional role is often unclear. Even glycolysis, the paradigm example of oscillatory biochemistry, has seen questions about its function. Here, we take a systems approach to summarize evidence that oscillations play critical physiological roles. Oscillatory behavior enables systems to avoid desensitization, to avoid chronically high and therefore toxic levels of chemicals, and to become more resistant to noise. Oscillation also enables complex physiological systems to reconcile incompatible conditions such as oxidation and reduction, by cycling between them, and to synchronize the oscillations of many small units into one large effect. In pancreatic b cells, we show that glycolytic oscillations are in synchrony with calcium and mitochondrial oscillations to drive pulsatile insulin release, which is pivotal for the liver to regulate blood glucose dynamics. In addition, oscillation can keep biological time, essential for embryonic development in promoting cell diversity and pattern formation. The functional importance of oscillatory processes requires a rethinking of the traditional doctrine of homeostasis, holding that physiological quantities are maintained at constant equilibrium values, which has largely failed us in the clinic. A more dynamic approach will enable us to view health and disease through a new light and initiate a paradigm shift in treating diseases, including depression and cancer. This modern synthesis also takes a deeper look into the mechanisms that create and sustain oscillatory processes, which requires the language of nonlinear dynamics, well beyond the linearization techniques of equilibrium control theory.

arxiv.org

Forecasting local hospital bed demand for COVID-19 using on-request simulations. (arXiv:2301.09097v1 [q-bio.QM]) arxiv.org/abs/2301.09097

Forecasting local hospital bed demand for COVID-19 using on-request simulations

For hospitals, realistic forecasting of bed demand during impending epidemics of infectious diseases is essential to avoid being overwhelmed by a potential sudden increase in the number of admitted patients. Short-term forecasting can aid hospitals in adjusting their planning and freeing up beds in time. We created an easy-to-use online on-request tool based on local data to forecast COVID-19 bed demand for individual hospitals. The tool is flexible and adaptable to different settings. It is based on a stochastic compartmental model for estimating the epidemic dynamics and coupled with an exponential smoothing model for forecasting. The models are written in R and Julia and implemented as an R-shiny dashboard. The model is parameterized using COVID-19 incidence, vaccination, and bed occupancy data at customizable geographical resolutions, loaded from official online sources or uploaded manually. Users can select their hospital's catchment area and adjust the number of COVID-19 occupied beds at the start of the simulation. The tool provides short-term forecasts of disease incidence and past and forecasted estimation of the epidemic reproductive number at the chosen geographical level. These quantities are then used to estimate the bed occupancy in both general wards and intensive care unit beds. The platform has proven efficient, providing results within seconds while coping with many concurrent users. By providing ad-hoc, local data informed forecasts, this platform allows decision-makers to evaluate realistic scenarios for allocating scarce resources, such as ICU beds, at various geographic levels.

arxiv.org

The state of quantum computing applications in health and medicine. (arXiv:2301.09106v1 [quant-ph]) arxiv.org/abs/2301.09106

The state of quantum computing applications in health and medicine

Quantum computing hardware and software have made enormous strides over the last years. Questions around quantum computing's impact on research and society have changed from "if" to "when/how". The 2020s have been described as the "quantum decade", and the first production solutions that drive scientific and business value are expected to become available over the next years. Medicine, including fields in healthcare and life sciences, has seen a flurry of quantum-related activities and experiments in the last few years (although medicine and quantum theory have arguably been entangled ever since Schrödinger's cat). The initial focus was on biochemical and computational biology problems; recently, however, clinical and medical quantum solutions have drawn increasing interest. The rapid emergence of quantum computing in health and medicine necessitates a mapping of the landscape. In this review, clinical and medical proof-of-concept quantum computing applications are outlined and put into perspective. These consist of over 40 experimental and theoretical studies from the last few years. The use case areas span genomics, clinical research and discovery, diagnostics, and treatments and interventions. Quantum machine learning (QML) in particular has rapidly evolved and shown to be competitive with classical benchmarks in recent medical research. Near-term QML algorithms, for instance, quantum support vector classifiers and quantum neural networks, have been trained with diverse clinical and real-world data sets. This includes studies in generating new molecular entities as drug candidates, diagnosing based on medical image classification, predicting patient persistence, forecasting treatment effectiveness, and tailoring radiotherapy. The use cases and algorithms are summarized and an outlook on medicine in the quantum era, including technical and ethical challenges, is provided.

arxiv.org

Statistical reproducibility of meta-analysis research claims for medical mask use in community settings to prevent COVID infection. (arXiv:2301.09189v1 [q-bio.QM]) arxiv.org/abs/2301.09189

Statistical reproducibility of meta-analysis research claims for medical mask use in community settings to prevent COVID infection

The coronavirus pandemic (COVID) has been an exceptional test of current scientific evidence that inform and shape policy. Many US states, cities, and counties implemented public orders for mask use on the notion that this intervention would delay and flatten the epidemic peak and largely benefit public health outcomes. P-value plotting was used to evaluate statistical reproducibility of meta-analysis research claims of a benefit for medical (surgical) mask use in community settings to prevent COVID infection. Eight studies (seven meta-analyses, one systematic review) published between 1 January 2020 and 7 December 2022 were evaluated. Base studies were randomized control trials with outcomes of medical diagnosis or laboratory-confirmed diagnosis of viral (Influenza or COVID) illness. Self-reported viral illness outcomes were excluded because of awareness bias. No evidence was observed for a medical mask use benefit to prevent viral infections in six p-value plots (five meta-analyses and one systematic review). Research claims of no benefit in three meta-analyses and the systematic review were reproduced in p-value plots. Research claims of a benefit in two meta-analyses were not reproduced in p-value plots. Insufficient data were available to construct p-value plots for two meta-analyses because of overreliance on self-reported outcomes. These findings suggest a benefit for medical mask use in community settings to prevent viral, including COVID infection, is unproven.

arxiv.org

RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes. (arXiv:2301.09200v1 [q-bio.GN]) arxiv.org/abs/2301.09200

RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either 1) require powerful computational resources that may not be available for portable sequencers or 2) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: 1) read mapping, 2) relative abundance estimation, and 3) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides 1) 25.8x and 3.4x better average throughput and 2) an average speedup of 32.1x and 2.1x in the mapping time, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.