Show newer

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers. (arXiv:2302.01337v1 [q-bio.QM]) arxiv.org/abs/2302.01337

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers

Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze fashion. To address the need, a relational database has been developed integrating status of cancer-critical gene mutations, serum galectin profiles, serum and tumor glycomic profiles, with clinical, demographic, and lifestyle data points of individual cancer patients. The database, as a backend, provides physicians and researchers with a single, easily accessible repository of cancer profiling data to aid-in and enhance individualized treatment. Our interactive database allows care providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of finding "molecular signatures" based upon a combination of genetic mutations, galectin serum levels, glycan compositions, and patient clinical data and lifestyle choices. Our project provides a framework for an integrated, interactive, and growing database to analyze molecular and clinical patterns across cancer stages and subtypes and provides opportunities for increased diagnostic and prognostic power.

arxiv.org

Transport-based morphometry of nuclear structures of digital pathology images in cancers. (arXiv:2302.01449v1 [q-bio.QM]) arxiv.org/abs/2302.01449

Transport-based morphometry of nuclear structures of digital pathology images in cancers

Alterations in nuclear morphology are useful adjuncts and even diagnostic tools used by pathologists in the diagnosis and grading of many tumors, particularly malignant tumors. Large datasets such as TCGA and the Human Protein Atlas, in combination with emerging machine learning and statistical modeling methods, such as feature extraction and deep learning techniques, can be used to extract meaningful knowledge from images of nuclei, particularly from cancerous tumors. Here we describe a new technique based on the mathematics of optimal transport for modeling the information content related to nuclear chromatin structure directly from imaging data. In contrast to other techniques, our method represents the entire information content of each nucleus relative to a template nucleus using a transport-based morphometry (TBM) framework. We demonstrate the model is robust to different staining patterns and imaging protocols, and can be used to discover meaningful and interpretable information within and across datasets and cancer types. In particular, we demonstrate morphological differences capable of distinguishing nuclear features along the spectrum from benign to malignant categories of tumors across different cancer tissue types, including tumors derived from liver parenchyma, thyroid gland, lung mesothelium, and skin epithelium. We believe these proof of concept calculations demonstrate that the TBM framework can provide the quantitative measurements necessary for performing meaningful comparisons across a wide range of datasets and cancer types that can potentially enable numerous cancer studies, technologies, and clinical applications and help elevate the role of nuclear morphometry into a more quantitative science. The source codes implementing our method is available at https://github.com/rohdelab/nuclear_morphometry.

arxiv.org

Four principles for improved statistical ecology. (arXiv:2302.01528v1 [stat.ME]) arxiv.org/abs/2302.01528

Four principles for improved statistical ecology

Increasing attention has been drawn to the misuse of statistical methods over recent years, with particular concern about the prevalence of practices such as poor experimental design, cherry-picking and inadequate reporting. These failures are largely unintentional and no more common in ecology than in other scientific disciplines, with many of them easily remedied given the right guidance. Originating from a discussion at the 2020 International Statistical Ecology Conference, we show how ecologists can build their research following four guiding principles for impactful statistical research practices: 1. Define a focused research question, then plan sampling and analysis to answer it; 2. Develop a model that accounts for the distribution and dependence of your data; 3. Emphasise effect sizes to replace statistical significance with ecological relevance; 4. Report your methods and findings in sufficient detail so that your research is valid and reproducible. Listed in approximate order of importance, these principles provide a framework for experimental design and reporting that guards against unsound practices. Starting with a well-defined research question allows researchers to create an efficient study to answer it, and guards against poor research practices that lead to false positives and poor replicability. Correct and appropriate statistical models give sound conclusions, good reporting practices and a focus on ecological relevance make results impactful and replicable. Illustrated with an example from a recent study into the impact of disturbance on upland swamps, this paper explains the rationale for the selection and use of effective statistical practices and provides practical guidance for ecologists seeking to improve their use of statistical methods.

arxiv.org

Statistical Genetics in and out of Quasi-Linkage Equilibrium (Extended). (arXiv:2105.01428v6 [q-bio.PE] UPDATED) arxiv.org/abs/2105.01428

Statistical Genetics in and out of Quasi-Linkage Equilibrium (Extended)

This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase of quasi-linkage equilibrium (QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale by (Neher and Shraiman, 2011). What we will refer to as the Kimura-Neher-Shraiman (KNS) theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate and/or mutation rates with respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence (NRC) where variability persists in the population without either fixating or disappearing.

arxiv.org

Subspace orthogonalization as a mechanism for binding values to space. (arXiv:2205.06769v2 [q-bio.NC] UPDATED) arxiv.org/abs/2205.06769

Subspace orthogonalization as a mechanism for binding values to space

When choosing between options, we must solve an important binding problem. The values of the options must be associated with information about the action needed to select them. We hypothesize that the brain solves this binding problem through use of distinct population subspaces. To test this hypothesis, we examined the responses of single neurons in five reward-sensitive regions in rhesus macaques performing a risky choice task. In all areas, neurons encoded the value of the offers presented on both the left and the right side of the display in semi-orthogonal subspaces, which served to bind the values of the two offers to their positions in space. Supporting the idea that this orthogonalization is functionally meaningful, we observed a session-to-session covariation between choice behavior and the orthogonalization of the two value subspaces: trials with less orthogonalized subspaces were associated with greater likelihood of choosing the less valued option. Further inspection revealed that these semi-orthogonal subspaces arose from a combination of linear and nonlinear mixed selectivity in the neural population. We show this combination of selectivity balances reliable binding with an ability to generalize value across different spatial locations. These results support the hypothesis that semi-orthogonal subspaces support reliable binding, which is essential to flexible behavior in the face of multiple options.

arxiv.org

Conditional Antibody Design as 3D Equivariant Graph Translation. (arXiv:2208.06073v4 [q-bio.BM] UPDATED) arxiv.org/abs/2208.06073

Conditional Antibody Design as 3D Equivariant Graph Translation

Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapability of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose Multi-channel Equivariant Attention Network (MEAN) to co-design 1D sequences and 3D structures of CDRs. To be specific, MEAN formulates antibody design as a conditional graph translation problem by importing extra components including the target antigen and the light chain of the antibody. Then, MEAN resorts to E(3)-equivariant message passing along with a proposed attention mechanism to better capture the geometrical correlation between different components. Finally, it outputs both the 1D sequences and 3D structure via a multi-round progressive full-shot scheme, which enjoys more efficiency and precision against previous autoregressive approaches. Our method significantly surpasses state-of-the-art models in sequence and structure modeling, antigen-binding CDR design, and binding affinity optimization. Specifically, the relative improvement to baselines is about 23% in antigen-binding CDR design and 34% for affinity optimization.

arxiv.org

ImageNomer: developing an fMRI and omics visualization tool to detect racial bias in functional connectivity. (arXiv:2302.00767v1 [q-bio.PE]) arxiv.org/abs/2302.00767

ImageNomer: developing an fMRI and omics visualization tool to detect racial bias in functional connectivity

It can be difficult to identify trends and perform quality control in large, high-dimensional fMRI or omics datasets. To remedy this, we develop ImageNomer, a data visualization and analysis tool that allows inspection of both subject-level and cohort-level features. The tool allows visualization of phenotype correlation with functional connectivity (FC), partial connectivity (PC), dictionary components (PCA and our own method), and genomic data (single-nucleotide polymorphisms, SNPs). In addition, it allows visualization of weights from arbitrary ML models. ImageNomer is built with a Python backend and a Vue frontend. We validate ImageNomer using the Philadelphia Neurodevelopmental Cohort (PNC) dataset, which contains multitask fMRI and SNP data of healthy adolescents. Using correlation, greedy selection, or model weights, we find that a set of 10 FC features can explain 15% of variation in age, compared to 35% for the full 34,716 feature model. The four most significant FCs are either between bilateral default mode network (DMN) regions or spatially proximal subcortical areas. Additionally, we show that whereas both FC (fMRI) and SNPs (genomic) features can account for 10-15% of intelligence variation, this predictive ability disappears when controlling for race. We find that FC features can be used to predict race with 85% accuracy, compared to 78% accuracy for sex prediction. Using ImageNomer, this work casts doubt on the possibility of finding unbiased intelligence-related features in fMRI and SNPs of healthy adolescents.

arxiv.org

Meta-Analytic Operation of Threshold-independent Filtering (MOTiF) Reveals Sub-threshold Genomic Robustness in Trisomy. (arXiv:2302.00772v1 [q-bio.QM]) arxiv.org/abs/2302.00772

Meta-Analytic Operation of Threshold-independent Filtering (MOTiF) Reveals Sub-threshold Genomic Robustness in Trisomy

Trisomy, a form of aneuploidy wherein the cell possesses an additional copy of a specific chromosome, exhibits a high correlation with cancer. Studies from across different hosts, cell-lines, and labs into the cellular effects induced by aneuploidy have conflicted, ranging from small, chaotic global changes to large instances of either overexpression or underexpression throughout the trisomic chromosome. We ascertained that conflicting findings may be correct but miss the overarching ground truth due to careless use of thresholds. To correct this deficiency, we introduce the Meta-analytic Operation of Threshold-independent Filtering (MOTiF) method, which begins by providing a panoramic view of all thresholds, transforms the data to eliminate the effects accounted for by known mechanisms, and then reconstructs an explanation of the mechanisms that underly the difference between the baseline and the uncharacterized effects observed. As a proof of concept, we applied MOTiF to human colonic epithelial cells, discovering a uniform decrease in gene expression levels throughout the genome, which while significant, is beneath most common thresholds. Using Hi-C data we identified the structural correlate, wherein the physical genomic architecture condenses, compactifying in a uniform, genome-wide manner, which we hypothesize is a robustness mechanism counteracting the addition of a chromosome. We were able to decompose the gene expression alterations into three overlapping mechanisms: the raw chromosome content, the genomic compartmentalization, and the global structural condensation. While further studies must be conducted to corroborate the hypothesized robustness mechanism, MOTiF presents a useful meta-analytic tool in the realm of gene expression and beyond.

arxiv.org

Using Machine Learning to Develop Smart Reflex Testing Protocols. (arXiv:2302.00794v1 [cs.LG]) arxiv.org/abs/2302.00794

Using Machine Learning to Develop Smart Reflex Testing Protocols

Objective: Reflex testing protocols allow clinical laboratories to perform second line diagnostic tests on existing specimens based on the results of initially ordered tests. Reflex testing can support optimal clinical laboratory test ordering and diagnosis. In current clinical practice, reflex testing typically relies on simple "if-then" rules; however, this limits their scope since most test ordering decisions involve more complexity than a simple rule will allow. Here, using the analyte ferritin as an example, we propose an alternative machine learning-based approach to "smart" reflex testing with a wider scope and greater impact than traditional rule-based approaches. Methods: Using patient data, we developed a machine learning model to predict whether a patient getting CBC testing will also have ferritin testing ordered, consider applications of this model to "smart" reflex testing, and evaluate the model by comparing its performance to possible rule-based approaches. Results: Our underlying machine learning models performed moderately well in predicting ferritin test ordering and demonstrated greater suitability to reflex testing than rule-based approaches. Using chart review, we demonstrate that our model may improve ferritin test ordering. Finally, as a secondary goal, we demonstrate that ferritin test results are missing not at random (MNAR), a finding with implications for unbiased imputation of missing test results. Conclusions: Machine learning may provide a foundation for new types of reflex testing with enhanced benefits for clinical diagnosis and laboratory utilization management.

arxiv.org

Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling. (arXiv:2302.00855v1 [q-bio.MN]) arxiv.org/abs/2302.00855

Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling

Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer.

arxiv.org

The Challenges and Opportunities in Creating an Early Warning System for Global Pandemics. (arXiv:2302.00863v1 [q-bio.QM]) arxiv.org/abs/2302.00863

The Challenges and Opportunities in Creating an Early Warning System for Global Pandemics

The COVID-19 pandemic revealed that global health, social systems, and economies can be surprisingly fragile in an increasingly interconnected and interdependent world. Yet, during the last half of 2022, and quite remarkably, we began dismantling essential infectious disease monitoring programs in several countries. Absent such programs, localized biological risks will transform into global shocks linked directly to our lack of foresight regarding emerging health risks. Additionally, recent studies indicate that more than half of all infectious diseases could be made worse by climate change, complicating pandemic containment. Despite this complexity, the factors leading to pandemics are largely predictable but can only be realized through a well-designed global early warning system. Such a system should integrate data from genomics, climate and environment, social dynamics, and healthcare infrastructure. The glue for such a system is community-driven modeling, a modern logistics of data, and democratization of AI tools. Using the example of dengue fever in Brazil, we can demonstrate how thoughtfully designed technology platforms can build global-scale precision disease detection and response systems that significantly reduce exposure to systemic shocks, accelerate science-informed public health policies, and deliver reliable healthcare and economic opportunities as an intrinsic part of the global sustainable development agenda.

arxiv.org

Quantifying optimal resource allocation strategies for controlling epidemics. (arXiv:2302.00960v1 [q-bio.PE]) arxiv.org/abs/2302.00960

Quantifying optimal resource allocation strategies for controlling epidemics

Frequent emergence of communicable diseases has been a major concern worldwide. Lack of sufficient resources to mitigate the disease-burden makes the situation even more challenging for lower-income countries. Hence, strategy development towards disease eradication and optimal management of the social and economic burden has garnered a lot of attention in recent years. In this context, we quantify the optimal fraction of resources that can be allocated to two major intervention measures, namely reduction of disease transmission and improvement of healthcare infrastructure. Our results demonstrate that the effectiveness of each of the interventions has a significant impact on the optimal resource allocation in both long-term disease dynamics and outbreak scenarios. Often allocating resources to both strategies is optimal. For long-term dynamics, a non-monotonic behavior of optimal resource allocation with intervention effectiveness is observed which is different from the more intuitive strategy recommended in case of outbreaks. Further, our result indicates that the relationship between investment into interventions and the corresponding outcomes has a decisive role in determining optimal strategies. Intervention programs with decreasing returns promote the necessity for resource sharing. Our study provides a fundamental insight into determining the best response strategy in case of controlling epidemics under resource-constrained situations.

arxiv.org

Biophysical aspects of neurocognitive modeling with long-term sustained temperature variations. (arXiv:2302.01019v1 [q-bio.NC]) arxiv.org/abs/2302.01019

Biophysical aspects of neurocognitive modeling with long-term sustained temperature variations

Long-term focused attention with visualization and breathing exercises is at the core of various Eastern traditions. Neurocognitive and psychosomatic phenomena demonstrated during such exercises were instrumentally explored with EEG and other sensors. Neurocognitive modeling in the form of meditative visualization produced persistent temperature effects in the body long after the exercise finished; this raises the question about their psychosomatic or biophysical origin. The work explores this question by comparing experiments with focusing attention inside and outside the body. EEG, temperature, heart and breathing sensors monitor internal body conditions, high resolution differential calorimetric sensors are used to detect thermal effects outside the body. Experiments with 159 attempts (2427 operator-sensor sessions) were carried over five months, control measurements run in the same conditions in parallel to experimental series. Increase of body temperature up to moderate fever zone 38.5 C and intentional control of up and down trend of core temperature by 1.6 C are demonstrated. Persistent temperature variations last >60 min. Experiments also demonstrated induced thermal fluctuations at 10^-3 C level in external calorimetric systems with 15 ml of water for 60-90 min. Repeatability of these attempts is over 90%, statistical Chi-square and Mann-Whitney tests reject the null hypotheses about random character of outcomes. Thus, the obtained data confirm the persistent thermal effects reported in previous publications and indicate their biophysical dimension. To explain these results we refer to a new model in neuroscience that involves spin phenomena in biochemical and physical systems. These experiments demonstrate complex biophysical mechanisms of altered states of consciousness; their function in the body's neurohumoral regulation and non-classical brain functions is discussed.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.