Show newer

Topological and geometric analysis of cell states in single-cell transcriptomic data. (arXiv:2309.07950v1 [q-bio.QM]) arxiv.org/abs/2309.07950

Topological and geometric analysis of cell states in single-cell transcriptomic data

Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data where the clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions which often rely on prior clustering results. Present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently displays complex high-dimensional structures. Here, we present a tool, scGeom for exploiting the multiscale and multidimensional structures in scRNA-seq data by inspecting the geometry via graph curvature and topology via persistent homology of both cell networks and gene networks. We demonstrate the utility of these structural features for reflecting biological properties and functions in several applications where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and developmental potency of cells. We additionally illustrate that the structural characteristics can improve the classification of cell types.

arxiv.org

Mathematical modeling of heterogeneous stem cell regeneration: from cell division to Waddington's epigenetic landscape. (arXiv:2309.08064v1 [q-bio.QM]) arxiv.org/abs/2309.08064

Mathematical modeling of heterogeneous stem cell regeneration: from cell division to Waddington's epigenetic landscape

Stem cell regeneration is a crucial biological process for most self-renewing tissues during the development and maintenance of tissue homeostasis. In developing the mathematical models of stem cell regeneration and tissue development, cell division is the core process connecting different scale biological processes and leading to changes in both cell population number and the epigenetic state of cells. This review focuses on the primary strategies for modeling cell division in biological systems. The Lagrange coordinate modeling approach considers gene network dynamics within each individual cell and random changes in cell states and model parameters during cell division. In contrast, the Euler coordinate modeling approach formulates the evolution of cell population numbers with the same epigenetic state via a differential-integral equation. These strategies focus on different scale dynamics, respectively, and result in two methods of modeling Waddington's epigenetic landscape: the Fokker-Planck equation and the differential-integral equation approaches. The differential-integral equation approach formulates the evolution of cell population density based on simple assumptions in cell proliferation, apoptosis, differentiation, and epigenetic state transitions during cell division. Moreover, machine learning methods can establish low-dimensional macroscopic measurements of a cell based on single-cell RNA sequencing data. The low dimensional measurements can quantify the epigenetic state of cells and become connections between static single-cell RNA sequencing data with dynamic equations for tissue development processes. The differential-integral equation approach presented in this review provides a reasonable understanding of the complex biological processes of tissue development and tumor progression.

arxiv.org

Analysis of a stochastic SIR model with media effects. (arXiv:2309.08126v1 [q-bio.PE]) arxiv.org/abs/2309.08126

MIML: Multiplex Image Machine Learning for High Precision Cell Classification via Mechanical Traits within Microfluidic Systems. (arXiv:2309.08421v1 [eess.IV]) arxiv.org/abs/2309.08421

MIML: Multiplex Image Machine Learning for High Precision Cell Classification via Mechanical Traits within Microfluidic Systems

Label-free cell classification is advantageous for supplying pristine cells for further use or examination, yet existing techniques frequently fall short in terms of specificity and speed. In this study, we address these limitations through the development of a novel machine learning framework, Multiplex Image Machine Learning (MIML). This architecture uniquely combines label-free cell images with biomechanical property data, harnessing the vast, often underutilized morphological information intrinsic to each cell. By integrating both types of data, our model offers a more holistic understanding of the cellular properties, utilizing morphological information typically discarded in traditional machine learning models. This approach has led to a remarkable 98.3\% accuracy in cell classification, a substantial improvement over models that only consider a single data type. MIML has been proven effective in classifying white blood cells and tumor cells, with potential for broader application due to its inherent flexibility and transfer learning capability. It's particularly effective for cells with similar morphology but distinct biomechanical properties. This innovative approach has significant implications across various fields, from advancing disease diagnostics to understanding cellular behavior.

arxiv.org

A Spiking Binary Neuron -- Detector of Causal Links. (arXiv:2309.08476v1 [cs.NE]) arxiv.org/abs/2309.08476

A Spiking Binary Neuron -- Detector of Causal Links

Causal relationship recognition is a fundamental operation in neural networks aimed at learning behavior, action planning, and inferring external world dynamics. This operation is particularly crucial for reinforcement learning (RL). In the context of spiking neural networks (SNNs), events are represented as spikes emitted by network neurons or input nodes. Detecting causal relationships within these events is essential for effective RL implementation. This research paper presents a novel approach to realize causal relationship recognition using a simple spiking binary neuron. The proposed method leverages specially designed synaptic plasticity rules, which are both straightforward and efficient. Notably, our approach accounts for the temporal aspects of detected causal links and accommodates the representation of spiking signals as single spikes or tight spike sequences (bursts), as observed in biological brains. Furthermore, this study places a strong emphasis on the hardware-friendliness of the proposed models, ensuring their efficient implementation on modern and future neuroprocessors. Being compared with precise machine learning techniques, such as decision tree algorithms and convolutional neural networks, our neuron demonstrates satisfactory accuracy despite its simplicity. In conclusion, we introduce a multi-neuron structure capable of operating in more complex environments with enhanced accuracy, making it a promising candidate for the advancement of RL applications in SNNs.

arxiv.org

Current and future directions in network biology. (arXiv:2309.08478v1 [q-bio.MN]) arxiv.org/abs/2309.08478

Current and future directions in network biology

Network biology, an interdisciplinary field at the intersection of computational and biological sciences, is critical for deepening understanding of cellular functioning and disease. While the field has existed for about two decades now, it is still relatively young. There have been rapid changes to it and new computational challenges have arisen. This is caused by many factors, including increasing data complexity, such as multiple types of data becoming available at different levels of biological organization, as well as growing data size. This means that the research directions in the field need to evolve as well. Hence, a workshop on Future Directions in Network Biology was organized and held at the University of Notre Dame in 2022, which brought together active researchers in various computational and in particular algorithmic aspects of network biology to identify pressing challenges in this field. Topics that were discussed during the workshop include: inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Video recordings of the workshop presentations are publicly available on YouTube. For even broader impact of the workshop, this paper, co-authored mostly by the workshop participants, summarizes the discussion from the workshop. As such, it is expected to help shape short- and long-term vision for future computational and algorithmic research in network biology.

arxiv.org

Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment. (arXiv:2309.07134v1 [eess.SP]) arxiv.org/abs/2309.07134

Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment

The study presents the concept of a computationally efficient machine learning (ML) model for diagnosing and monitoring Parkinson's disease (PD) in an Internet of Things (IoT) environment using rest-state EEG signals (rs-EEG). We computed different types of entropy from EEG signals and found that Fuzzy Entropy performed the best in diagnosing and monitoring PD using rs-EEG. We also investigated different combinations of signal frequency ranges and EEG channels to accurately diagnose PD. Finally, with a fewer number of features (11 features), we achieved a maximum classification accuracy (ARKF) of ~99.9%. The most prominent frequency range of EEG signals has been identified, and we have found that high classification accuracy depends on low-frequency signal components (0-4 Hz). Moreover, the most informative signals were mainly received from the right hemisphere of the head (F8, P8, T8, FC6). Furthermore, we assessed the accuracy of the diagnosis of PD using three different lengths of EEG data (150-1000 samples). Because the computational complexity is reduced by reducing the input data. As a result, we have achieved a maximum mean accuracy of 99.9% for a sample length (LEEG) of 1000 (~7.8 seconds), 98.2% with a LEEG of 800 (~6.2 seconds), and 79.3% for LEEG = 150 (~1.2 seconds). By reducing the number of features and segment lengths, the computational cost of classification can be reduced. Lower-performance smart ML sensors can be used in IoT environments for enhances human resilience to PD.

arxiv.org

Transmission matrix parameter estimation of COVID-19 evolution with age compartments using ensemble-based data assimilation. (arXiv:2309.07146v1 [q-bio.PE]) arxiv.org/abs/2309.07146

Transmission matrix parameter estimation of COVID-19 evolution with age compartments using ensemble-based data assimilation

The COVID-19 pandemic and its multiple outbreaks have challenged governments around the world. Much of the epidemiological modeling was based on pre-pandemic contact information of the population, which changed drastically due to governmental health measures, so called non-pharmaceutical interventions made to reduce transmission of the virus, like social distancing and complete lockdown. In this work, we evaluate an ensemble-based data assimilation framework applied to a meta-population model to infer the transmission of the disease between different population agegroups. We perform a set of idealized twin-experiments to investigate the performance of different possible parameterizations of the transmission matrix. These experiments show that it is not possible to unambiguously estimate all the independent parameters of the transmission matrix. However, under certain parameterizations, the transmission matrix in an age-compartmental model can be estimated. These estimated parameters lead to an increase of forecast accuracy in agegroups compartments assimilating age-dependent accumulated cases and deaths observed in Argentina compared to a single-compartment model, and reliable estimations of the effective reproduction number. The age-dependent data assimilation and forecasting of virus transmission may be important for an accurate prediction and diagnosis of health care demand.

arxiv.org

Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models. (arXiv:2309.07149v1 [eess.SP]) arxiv.org/abs/2309.07149

Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models

Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.

arxiv.org

Revive, Restore, Revitalize: An Eco-economic Methodology for Maasai Mara. (arXiv:2309.07165v1 [q-bio.PE]) arxiv.org/abs/2309.07165

Revive, Restore, Revitalize: An Eco-economic Methodology for Maasai Mara

The Maasai Mara in Kenya, renowned for its biodiversity, is witnessing ecosystem degradation and species endangerment due to intensified human activities. Addressing this, we introduce a dynamic system harmonizing ecological and human priorities. Our agent-based model replicates the Maasai Mara savanna ecosystem, incorporating 71 animal species, 10 human classifications, and 2 natural resource types. The model employs the metabolic rate-mass relationship for animal energy dynamics, logistic curves for animal growth, individual interactions for food web simulation, and human intervention impacts. Algorithms like fitness proportional selection and particle swarm mimic organism preferences for resources. To guide preservation activities, we formulated 21 management strategies encompassing tourism, transportation, taxation, environmental conservation, research, diplomacy, and poaching, employing a game-theoretic framework. Using the TOPSIS method, we prioritized four key developmental indicators: environmental health, research advancement, economic growth, and security. The interplay of 16 factors determines these indicators, each influenced by our policies to varying degrees. By evaluating the policies' repercussions, we aim to mitigate adverse animal-human interactions and equitably address human concerns. We classified the policy impacts into three categories: Environmental Preservation, Economic Prosperity, and Holistic Development. By applying these policy groupings to our ecosystem model, we tracked the effects on the intricate animal-human-resource dynamics. Utilizing the entropy weight method, we assessed the efficacy of these policy clusters over a decade, identifying the optimal blend emphasizing both environmental conservation and economic progression.

arxiv.org

CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis. (arXiv:2309.07178v1 [q-bio.QM]) arxiv.org/abs/2309.07178

CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html

arxiv.org

Clinical dichotomania: A major cause of over-diagnosis and over-treatment?. (arXiv:2309.07194v1 [q-bio.OT]) arxiv.org/abs/2309.07194

Clinical dichotomania: A major cause of over-diagnosis and over-treatment?

Introduction: There have been many warnings that inappropriate dichotomisation of results into positive or negative, high, or normal etc., during medical research could be very damaging. The aim of this paper is to argue that this is the main cause of over-diagnosis and over-treatment. Methods: Illustrative data were taken from a randomised control trial (RCT) that compared the frequency of nephropathy within 2 years in those on treatment with an angiotensin receptor blocker and a control and on patients in whom the numerical value of the albumin excretion rate (AER) was available on all patients before they are randomised. Results: When the RCT results were divided into AER ranges, a negligible proportion developed nephropathy within 2 years and benefited from treatment in the range 20 to 40mcg/min in which 36% of currently treated patients fall (and are thus over-diagnosed and overtreated). Above an AER of 40mcg/min, there was a gradual increase in proportions with nephropathy in each range, with fewer developing nephropathy in each range on irbesartan 150mg daily than on control and fewer still developing nephropathy on 300mg daily. Interpretation: When logistic regression functions were fitted to the data and calibrated, curves were created that allowed outcome probabilities and absolute risk reductions to be estimated for use in shared decision making (illustrated by application to an example patient). This could avoid much overdiagnosis and overtreatment. Conclusion: Careful attention to disease severity by interpreting each numerical diagnostic result provides better application of the principles of diagnosis and treatment decisions that can prevent over-diagnosis and over-treatment.

arxiv.org

Automated segmentation of rheumatoid arthritis immunohistochemistry stained synovial tissue. (arXiv:2309.07255v1 [eess.IV]) arxiv.org/abs/2309.07255

Automated segmentation of rheumatoid arthritis immunohistochemistry stained synovial tissue

Rheumatoid Arthritis (RA) is a chronic, autoimmune disease which primarily affects the joint's synovial tissue. It is a highly heterogeneous disease, with wide cellular and molecular variability observed in synovial tissues. Over the last two decades, the methods available for their study have advanced considerably. In particular, Immunohistochemistry stains are well suited to highlighting the functional organisation of samples. Yet, analysis of IHC-stained synovial tissue samples is still overwhelmingly done manually and semi-quantitatively by expert pathologists. This is because in addition to the fragmented nature of IHC stained synovial tissue, there exist wide variations in intensity and colour, strong clinical centre batch effect, as well as the presence of many undesirable artefacts present in gigapixel Whole Slide Images (WSIs), such as water droplets, pen annotation, folded tissue, blurriness, etc. There is therefore a strong need for a robust, repeatable automated tissue segmentation algorithm which can cope with this variability and provide support to imaging pipelines. We train a UNET on a hand-curated, heterogeneous real-world multi-centre clinical dataset R4RA, which contains multiple types of IHC staining. The model obtains a DICE score of 0.865 and successfully segments different types of IHC staining, as well as dealing with variance in colours, intensity and common WSIs artefacts from the different clinical centres. It can be used as the first step in an automated image analysis pipeline for synovial tissue samples stained with IHC, increasing speed, reproducibility and robustness.

arxiv.org

Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME]) arxiv.org/abs/2309.07261

Simultaneous inference for generalized linear models with unmeasured confounders

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.