Show newer

Statistical whitening of neural populations with gain-modulating interneurons. (arXiv:2301.11955v1 [q-bio.NC]) arxiv.org/abs/2301.11955

Statistical whitening of neural populations with gain-modulating interneurons

Statistical whitening transformations play a fundamental role in many computational systems, and may also play an important role in biological sensory systems. Individual neurons appear to rapidly and reversibly alter their input-output gains, approximately normalizing the variance of their responses. Populations of neurons appear to regulate their joint responses, reducing correlations between neural activities. It is natural to see whitening as the objective that guides these behaviors, but the mechanism for such joint changes is unknown, and direct adjustment of synaptic interactions would seem to be both too slow, and insufficiently reversible. Motivated by the extensive neuroscience literature on rapid gain modulation, we propose a recurrent network architecture in which joint whitening is achieved through modulation of gains within the circuit. Specifically, we derive an online statistical whitening algorithm that regulates the joint second-order statistics of a multi-dimensional input by adjusting the marginal variances of an overcomplete set of interneuron projections. The gains of these interneurons are adjusted individually, using only local signals, and feed back onto the primary neurons. The network converges to a state in which the responses of the primary neurons are whitened. We demonstrate through simulations that the behavior of the network is robust to poor conditioning or noise when the gains are sign-constrained, and can be generalized to achieve a form of local whitening in convolutional populations, such as those found throughout the visual or auditory system.

arxiv.org

The persistent homology of genealogical networks. (arXiv:2301.11965v1 [q-bio.MN]) arxiv.org/abs/2301.11965

The persistent homology of genealogical networks

Genealogical networks (i.e. family trees) are of growing interest, with the largest known data sets now including well over one billion individuals. Interest in family history also supports an 8.5 billion dollar industry whose size is projected to double within 7 years (FutureWise report HC1137). Yet little mathematical attention has been paid to the complex network properties of genealogical networks, especially at large scales. The structure of genealogical networks is of particular interest due to the practice of forming unions, e.g. marriages, that are typically well outside one's immediate family. In most other networks, including other social networks, no equivalent restriction exists on the distance at which relationships form. To study the effect this has on genealogical networks we use persistent homology to identify and compare the structure of 101 genealogical and 31 other social networks. Specifically, we introduce the notion of a network's persistence curve, which encodes the network's set of persistence intervals. We find that the persistence curves of genealogical networks have a distinct structure when compared to other social networks. This difference in structure also extends to subnetworks of genealogical and social networks suggesting that, even with incomplete data, persistent homology can be used to meaningfully analyze genealogical networks. Here we also describe how concepts from genealogical networks, such as common ancestor cycles, are represented using persistent homology. We expect that persistent homology tools will become increasingly important in genealogical exploration as popular interest in ancestry research continues to expand.

arxiv.org

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts. (arXiv:2301.12040v1 [q-bio.BM]) arxiv.org/abs/2301.12040

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts

Current protein language models (PLMs) learn protein representations mainly based on their sequences, thereby well capturing co-evolutionary information, but they are unable to explicitly acquire protein functions, which is the end goal of protein representation learning. Fortunately, for many proteins, their textual property descriptions are available, where their various functions are also described. Motivated by this fact, we first build the ProtDescribe dataset to augment protein sequences with text descriptions of their functions and other important properties. Based on this dataset, we propose the ProtST framework to enhance Protein Sequence pre-training and understanding by biomedical Texts. During pre-training, we design three types of tasks, i.e., unimodal mask prediction, multimodal representation alignment and multimodal mask prediction, to enhance a PLM with protein property information with different granularities and, at the same time, preserve the PLM's original representation power. On downstream tasks, ProtST enables both supervised learning and zero-shot prediction. We verify the superiority of ProtST-induced PLMs over previous ones on diverse representation learning benchmarks. Under the zero-shot setting, we show the effectiveness of ProtST on zero-shot protein classification, and ProtST also enables functional protein retrieval from a large-scale database without any function annotation.

arxiv.org

Optimizing a Bayesian method for estimating the Hurst exponent in behavioral sciences. (arXiv:2301.12064v1 [q-bio.QM]) arxiv.org/abs/2301.12064

Optimizing a Bayesian method for estimating the Hurst exponent in behavioral sciences

The Bayesian Hurst-Kolmogorov (HK) method estimates the Hurst exponent of a time series more accurately than the age-old detrended fluctuation analysis (DFA), especially when the time series is short. However, this advantage comes at the cost of computation time. The computation time increases exponentially with $N$, easily exceeding several hours for $N = 1024$, limiting the utility of the HK method in real-time paradigms, such as biofeedback and brain-computer interfaces. To address this issue, we have provided data on the estimation accuracy of $H$ for synthetic time series as a function of \textit{a priori} known values of $H$, the time series length, and the simulated sample size from the posterior distribution -- a critical step in the Bayesian estimation method. The simulated sample from the posterior distribution as small as $n = 25$ suffices to estimate $H$ with reasonable accuracy for a time series as short as $256$ measurements. Using a larger simulated sample from the posterior distribution -- i.e., $n > 50$ -- provides only marginal gain in accuracy, which might not be worth trading off with computational efficiency. We suggest balancing the simulated sample size from the posterior distribution of $H$ with the computational resources available to the user, preferring a minimum of $n = 50$ and opting for larger sample sizes based on time and resource constraints

arxiv.org

RCsearcher: Reaction Center Identification in Retrosynthesis via Deep Q-Learning. (arXiv:2301.12071v1 [cs.LG]) arxiv.org/abs/2301.12071

RCsearcher: Reaction Center Identification in Retrosynthesis via Deep Q-Learning

The reaction center consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants. Prior studies on reaction center identification are mainly on semi-templated retrosynthesis methods. Moreover, they are limited to single reaction center identification. However, many reaction centers are comprised of multiple bonds or atoms in reality. We refer to it as the multiple reaction center. This paper presents RCsearcher, a unified framework for single and multiple reaction center identification that combines the advantages of the graph neural network and deep reinforcement learning. The critical insight in this framework is that the single or multiple reaction center must be a node-induced subgraph of the molecular product graph. At each step, it considers choosing one node in the molecular product graph and adding it to the explored node-induced subgraph as an action. Comprehensive experiments demonstrate that RCsearcher consistently outperforms other baselines and can extrapolate the reaction center patterns that have not appeared in the training set. Ablation experiments verify the effectiveness of individual components, including the beam search and one-hop constraint of action space.

arxiv.org

Multiscale modelling of heavy metals adsorption on algal-bacterial photogranules. (arXiv:2301.12221v1 [q-bio.CB]) arxiv.org/abs/2301.12221

Multiscale modelling of heavy metals adsorption on algal-bacterial photogranules

A multiscale mathematical model describing the genesis and ecology of algal-bacterial photogranules and the metals biosorption on their solid matrix within a sequencing batch reactor (SBR) is presented. The granular biofilm is modelled as a spherical free boundary domain with radial symmetry and a vanishing initial value. The free boundary evolution is governed by an ODE accounting for microbial growth, attachment and detachment phenomena. The model is based on systems of PDEs derived from mass conservation principles. Specifically, two systems of nonlinear hyperbolic PDEs model the growth of attached species and the dynamics of free adsorption sites; and two systems of quasi-linear parabolic PDEs govern the diffusive transport and conversion of nutrients and metals. The model is completed with systems of impulsive ordinary differential equations (IDEs) describing the evolution of dissolved substrates, metals, and planktonic and detached biomasses within the granular-based SBR. All main phenomena involved in the process are considered in the mathematical model. Moreover, the dual effect of metal presence on the formation process of photogranules is accounted: metal stimulates the production of EPS by sessile species and negatively affects the metabolic activities of microbial species. To describe the effects related to metal presence, a stimulation term for EPS production and an inhibition term for metal are included in all microbial kinetics. The model is used to examine the role of the microbial species and EPS in the adsorption process, and the effect of metal concentration and adsorption proprieties of biofilm components on the metal removal. Numerical results show that the model accurately describes the photogranules evolution and ecology and confirm the applicability of algal-bacterial photogranules systems for metal-rich wastewater treatment.

arxiv.org

PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer. (arXiv:2301.12422v1 [q-bio.GN]) arxiv.org/abs/2301.12422

PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer

Motivation: As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins such as major tail, baseplate etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein classification. Results: In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos gaming representation, we can leverage Vision Transformer to learn both local and global features from sequence ``images''. Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results show the benefit of using classified proteins rather than all proteins.

arxiv.org

Oscillating behavior of a compartmental model with retarded noisy dynamic infection rate. (arXiv:2301.12437v1 [q-bio.PE]) arxiv.org/abs/2301.12437

Oscillating behavior of a compartmental model with retarded noisy dynamic infection rate

Our study is based on an epidemiological compartmental model, the SIRS model. In the SIRS model, each individual is in one of the states susceptible (S), infected(I) or recovered (R), depending on its state of health. In compartment R, an individual is assumed to stay immune within a finite time interval only and then transfers back to the S compartment. We extend the model and allow for a feedback control of the infection rate by mitigation measures which are related to the number of infections. A finite response time of the feedback mechanism is supposed that changes the low-dimensional SIRS model into an infinite-dimensional set of integro-differential (delay-differential) equations. It turns out that the retarded feedback renders the originally stable endemic equilibrium of SIRS (stable focus) into an unstable focus if the delay exceeds a certain critical value. Nonlinear solutions show persistent regular oscillations of the number of infected and susceptible individuals. In the last part we include noise effects from the environment and allow for a fluctuating infection rate. This results in multiplicative noise terms and our model turns into a set of stochastic nonlinear integro-differential equations. Numerical solutions reveal an irregular behavior of repeated disease outbreaks in the form of infection waves with a variety of frequencies and amplitudes.

arxiv.org

The Automated Discovery of Kinetic Rate Models -- Methodological Frameworks. (arXiv:2301.11356v1 [cs.SC]) arxiv.org/abs/2301.11356

The Automated Discovery of Kinetic Rate Models -- Methodological Frameworks

The industrialization of catalytic processes is of far more importance today than it has ever been before and kinetic models are essential tools for their industrialization. Kinetic models affect the design, the optimization and the control of catalytic processes, but they are not easy to obtain. Classical paradigms, such as mechanistic modeling require substantial domain knowledge, while data-driven and hybrid modeling lack interpretability. Consequently, a different approach called automated knowledge discovery has recently gained popularity. Many methods under this paradigm have been developed, where ALAMO, SINDy and genetic programming are notable examples. However, these methods suffer from important drawbacks: they require assumptions about model structures, scale poorly, lack robust and well-founded model selection routines, and they are sensitive to noise. To overcome these challenges, the present work constructs two methodological frameworks, Automated Discovery of Kinetics using a Strong/Weak formulation of symbolic regression, ADoK-S and ADoK-W, for the automated generation of catalytic kinetic models. We leverage genetic programming for model generation, a sequential optimization routine for model refinement, and a robust criterion for model selection. Both frameworks are tested against three computational case studies of increasing complexity. We showcase their ability to retrieve the underlying kinetic rate model with a limited amount of noisy data from the catalytic system, indicating a strong potential for chemical reaction engineering applications.

arxiv.org

Effect of different selected food processing techniques on control release kinetics of encapsulated lycopene in simulated human gastrointestinal tract. (arXiv:2301.11400v1 [q-bio.BM]) arxiv.org/abs/2301.11400

Effect of different selected food processing techniques on control release kinetics of encapsulated lycopene in simulated human gastrointestinal tract

Nanoencapsulation has become a widespread technique to improve the bioavailability of bioactive compounds like lycopene. Consumption of lycopene rich foods is effective in preventing cancer, diabetes, and cardiovascular diseases due to its strong, oxygen-quenching ability. But the functional activity of lycopene is compromised by light, oxygen, and heat. A biodegradable polymer such as PLA or PLGA is the most effective carrier for encapsulating lycopene due to its excellent biodegradability, biocompatibility, and nontoxigenic effect on the human metabolic system. Hence, the primary objective of this study was to evaluate the effect of pasteurization on bioaccessibility and control release kinetics of encapsulated lycopene nanoparticles in invitro human GIT. In the first objective, sonication time, surfactant, and polymer concentration were considered as the three factors to synthesize polymeric lycopene nanoparticles (LNP) whereas in the second objective, type of pasteurization, encapsulation, and juice concentration were three factors. In Objectives 4 and 5, the type of encapsulation and pasteurization, and digestion time were evaluated as the three factors used to evaluate the in-vitro bioaccessibility of encapsulated lycopene NP. The study evidenced that encapsulation improved the lycopene's bioaccessibility by 70% and more than 60% for conventional pasteurized (CP) and microwave pasteurized (MP) nanoemulsions, respectively without compromising the physicochemical properties of both PLA and PLGA-lycopene nanoparticles. The invitro bioaccessibility study also showed that CP reduced the functional activity of PLALNP by 20% whereas MP had no significant effect on the bioaccessibility of PLA LNP. It is unlikely that the PLGA LNP was more sensitive against MP nanoemulsion than the CP. In conclusion, it was shown that the PLA LNP treated with MP provided the highest bioaccessibility.

arxiv.org

Predator Extinction arose from Chaos of the Prey: the Chaotic Behavior of a Homomorphic Two-Dimensional Logistic Map in the Form of Lotka-Volterra Equations. (arXiv:2301.11669v1 [nlin.CD]) arxiv.org/abs/2301.11669

Predator Extinction arose from Chaos of the Prey: the Chaotic Behavior of a Homomorphic Two-Dimensional Logistic Map in the Form of Lotka-Volterra Equations

A two-dimensional homomorphic logistic map that preserves features of Lotka-Volterra Equations was proposed. In order to examine the Lotka-Volterra chaos, in addition to ordinary iteration plots of population, Lyapunov exponents either calculated directly from eigenvalues of Jacobian of the $2$D logistic mapping, or from time-series algorithms of both Rosenstein and Eckmann et al. were calculated, among which discrepancies were compared. Bifurcation diagrams may be divided into five categories depending on different topological shapes, among which flip bifurcation and Neimark-Sacker bifurcation were observed, the latter showing closed orbits around limit circles in the phase portrait and phase space diagram. Our model restored the $1$D logistic map of the prey at the absence of the predator, as well as the normal competing behavior between two species when the initial population of the two is equal. In spite of the possibility for two species going into chaos simultaneously, it is also possible that with the same inter-species parameters as normal but with predator population $10$ times more than that of the prey, under certain growth rate the latter becomes chaotic, and former dramatically reduces to zero, referring to total annihilation of the predator species. Interpreting humans as the predator and natural resources as the prey in the ecological system, the aforementioned conclusion may imply that not only excessive consumption of the natural resources, but its chaotic state triggered by overpopulation of humans may also backfire in a manner of total extinction on human species. Fortunately, a little chance may exist for survival of human race, as isolated fixed points in bifurcation diagram of the predator reveals.

arxiv.org

Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling. (arXiv:2301.11763v1 [cs.LG]) arxiv.org/abs/2301.11763

Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling

In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together. To evaluate our method, we selected two gene networks, mTOR and TGF-Beta. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-? pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than 96% and 99% classification accuracies for mTOR and TGF-Beta networks, respectively, using a limited amount of training samples.

arxiv.org

Reproducibility of health claims in meta-analysis studies of COVID quarantine (stay-at-home) orders. (arXiv:2301.11778v1 [q-bio.OT]) arxiv.org/abs/2301.11778

Reproducibility of health claims in meta-analysis studies of COVID quarantine (stay-at-home) orders

The coronavirus pandemic (COVID) has been an extraordinary test of modern government scientific procedures that inform and shape policy. Many governments implemented COVID quarantine (stay-at-home) orders on the notion that this nonpharmaceutical intervention would delay and flatten the epidemic peak and largely benefit public health outcomes. The overall research capacity response to COVID since late 2019 has been massive. Given lack of research transparency, only a small fraction of published research has been judged by others to be reproducible before COVID. Independent evaluation of published meta-analysis on a common research question can be used to assess the reproducibility of a claim coming from that field of research. We used a p-value plotting statistical method to independently evaluate reproducibility of specific research claims made in four meta-analysis studies related to benefits/risks of COVID quarantine orders. Outcomes we investigated included: mortality, mental health symptoms, incidence of domestic violence, and suicidal ideation (thoughts of killing yourself). Three of the four meta-analyses that we evaluated (mortality, mental health symptoms, incidence of domestic violence) raise further questions about benefits/risks of this form of intervention. The fourth meta-analysis study (suicidal ideation) is unreliable. Given lack of research transparency and irreproducibility of published research, independent evaluation of meta-analysis studies using p-value plotting is offered as a way to strengthen or refute (falsify) claims made in COVID research.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.