Show newer

Brain Age Group Classification Based on Resting State Functional Connectivity Metrics arxiv.org/abs/2503.21414

Brain Age Group Classification Based on Resting State Functional Connectivity Metrics

This study investigated age-related changes in functional connectivity using resting-state fMRI and explored the efficacy of traditional deep learning for classifying brain developmental stages (BDS). Functional connectivity was assessed using Seed-Based Phase Synchronization (SBPS) and Pearson correlation across 160 ROIs. Clustering was performed using t-SNE, and network topology was analyzed through graph-theoretic metrics. Adaptive learning was implemented to classify the age group by extracting bottleneck features through mobileNetV2. These deep features were embedded and classified using Random Forest and PCA. Results showed a shift in phase synchronization patterns from sensory-driven networks in youth to more distributed networks with aging. t-SNE revealed that SBPS provided the most distinct clustering of BDS. Global efficiency and participation coefficient followed an inverted U-shaped trajectory, while clustering coefficient and modularity exhibited a U-shaped pattern. MobileNet outperformed other models, achieving the highest classification accuracy for BDS. Aging was associated with reduced global integration and increased local connectivity, indicating functional network reorganization. While this study focused solely on functional connectivity from resting-state fMRI and a limited set of connectivity features, deep learning demonstrated superior classification performance, highlighting its potential for characterizing age-related brain changes.

arXiv.org

consexpressionR: an R package for consensus differential gene expression analysis arxiv.org/abs/2503.21546

consexpressionR: an R package for consensus differential gene expression analysis

Motivation: Bulk RNA-Seq is a widely used method for studying gene expression across a variety of contexts. The significance of RNA-Seq studies has grown with the advent of high-throughput sequencing technologies. Computational methods have been developed for each stage of the identification of differentially expressed genes. Nevertheless, there are few studies exploring the association between different types of methods. In this study, we evaluated the impact of the association of methodologies in the results of differential expression analysis. By adopting two data sets with qPCR data (to gold-standard reference), seven methods were implemented and assessed in R packages (EBSeq, edgeR, DESeq2, limma, SAMseq, NOISeq, and Knowseq), which was performed and assessed separately and in association. The results were evaluated considering the adopted qPCR data. Results: Here, we introduce consexpressionR, an R package that automates differential expression analysis using consensus of at least seven methodologies, producing more assertive results with a significant reduction in false positives. Availability: consexpressionR is an R package available via source code and support are available at GitHub (https://github.com/costasilvati/consexpressionR).

arXiv.org

Synchronization and chaos in complex ecological communities with delayed interactions arxiv.org/abs/2503.21551

Synchronization and chaos in complex ecological communities with delayed interactions

Explaining the wide range of dynamics observed in ecological communities is challenging due to the large number of species involved, the complex network of interactions among them, and the influence of multiple environmental variables. Here, we consider a general framework to model the dynamics of species-rich communities under the effects of external environmental factors, showing that it naturally leads to delayed interactions between species, and analyze the impact of such memory effects on population dynamics. Employing the generalized Lotka-Volterra equations with time delays and random interactions, we characterize the resulting dynamical phases in terms of the statistical properties of community interactions. Our findings reveal that memory effects can generate persistent and synchronized oscillations in species abundances in sufficiently competitive communities. This provides an additional explanation for synchronization in large communities, complementing known mechanisms such as predator-prey cycles and environmental periodic variability. Furthermore, we show that when reciprocal interactions are negatively correlated, time delays alone can induce chaotic behavior. This suggests that ecological complexity is not a prerequisite for unpredictable population dynamics, as intrinsic memory effects are sufficient to generate long-term fluctuations in species abundances. The techniques developed in this work are applicable to any high-dimensional random dynamical system with time delays.

arXiv.org

Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation arxiv.org/abs/2503.16085

Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation

We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations. Allostasis is a fundamental regulatory mechanism observed in animal physiology that orchestrates responses to maintain a dynamic equilibrium in bodily needs and internal states. In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation. While existing neural networks can maintain persistent states, to date, there is no unified framework for dynamically controlling spatial changes in neuronal activity in response to environmental changes. To address this, we couple a well known allostatic microcircuit, the Hammel model, with a ring attractor, resulting in a Spiking Neural Network architecture that can modulate the location of the bump as a function of some reference input. This localized activity in turn is used as a perceptual belief in a simulated subitization task a quick enumeration process without counting. We provide a general procedure to fine-tune the model and demonstrate the successful control of the bump location. We also study the response time in the model with respect to changes in parameters and compare it with biological data. Finally, we analyze the dynamics of the network to understand the selectivity and specificity of different neurons to distinct categories present in the input. The results of this paper, particularly the mechanism for moving persistent states, are not limited to numerical cognition but can be applied to a wide range of tasks involving similar representations.

arXiv.org

Targeting Neurodegeneration: Three Machine Learning Methods for G9a Inhibitors Discovery Using PubChem and Scikit-learn arxiv.org/abs/2503.16214

Targeting Neurodegeneration: Three Machine Learning Methods for G9a Inhibitors Discovery Using PubChem and Scikit-learn

In light of the increasing interest in G9a's role in neuroscience, three machine learning (ML) models, that are time efficient and cost effective, were developed to support researchers in this area. The models are based on data provided by PubChem and performed by algorithms interpreted by the scikit-learn Python-based ML library. The first ML model aimed to predict the efficacy magnitude of active G9a inhibitors. The ML models were trained with 3,112 and tested with 778 samples. The Gradient Boosting Regressor perform the best, achieving 17.81% means relative error (MRE), 21.48% mean absolute error (MAE), 27.39% root mean squared error (RMSE) and 0.02 coefficient of determination (R2) error. The goal of the second ML model called a CID_SID ML model, utilised PubChem identifiers to predict the G9a inhibition probability of a small biomolecule that has been primarily designed for different purposes. The ML models were trained with 58,552 samples and tested with 14,000. The most suitable classifier for this case study was the Extreme Gradient Boosting Classifier, which obtained 78.1% accuracy, 84.3% precision,69.1% recall, 75.9% F1-score and 8.1% Receiver-operating characteristic (ROC). The third ML model based on the Random Forest Classifier algorithm led to the generation of a list of descending-ordered functional groups based on their importance to the G9a inhibition. The model was trained with 19,455 samples and tested with 14,100. The probability of this rank was 70% accuracy.

arXiv.org

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens arxiv.org/abs/2503.16278 -mat.mtrl-sci .LG

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

Recent advancements in large language models and their multi-modal extensions have demonstrated the effectiveness of unifying generation and understanding through autoregressive next-token prediction. However, despite the critical role of 3D structural generation and understanding ({3D GU}) in AI for science, these tasks have largely evolved independently, with autoregressive methods remaining underexplored. To bridge this gap, we introduce Uni-3DAR, a unified framework that seamlessly integrates {3D GU} tasks via autoregressive prediction. At its core, Uni-3DAR employs a novel hierarchical tokenization that compresses 3D space using an octree, leveraging the inherent sparsity of 3D structures. It then applies an additional tokenization for fine-grained structural details, capturing key attributes such as atom types and precise spatial coordinates in microscopic 3D structures. We further propose two optimizations to enhance efficiency and effectiveness. The first is a two-level subtree compression strategy, which reduces the octree token sequence by up to 8x. The second is a masked next-token prediction mechanism tailored for dynamically varying token positions, significantly boosting model performance. By combining these strategies, Uni-3DAR successfully unifies diverse {3D GU} tasks within a single autoregressive framework. Extensive experiments across multiple microscopic {3D GU} tasks, including molecules, proteins, polymers, and crystals, validate its effectiveness and versatility. Notably, Uni-3DAR surpasses previous state-of-the-art diffusion models by a substantial margin, achieving up to 256\% relative improvement while delivering inference speeds up to 21.8x faster. The code is publicly available at https://github.com/dptech-corp/Uni-3DAR.

arXiv.org

Aging and mortality of persons with HIV: a novel Kalman Filtering and DMD framework arxiv.org/abs/2503.16297 .DS

Aging and mortality of persons with HIV: a novel Kalman Filtering and DMD framework

Due to the widespread availability of effective antiretroviral therapy (ART) regimens, average lifespans of persons with HIV (PWH) in the United States have increased significantly in recent decades. In turn, the demographic profile of PWH has shifted. Older persons comprise an ever-increasing percentage of PWH, with this percentage expected to further increase in the coming years. This has profound implications for HIV treatment and care, as significant resources are required not only to manage HIV itself, but associated age-related comorbidities and health conditions that occur in aging PWH. Effective management of these challenges in the coming years requires accurate modeling of the PWH age structure. In the present work, we introduce several novel mathematical approaches related to this problem. We present a workflow combining a PDE model for the PWH population age structure, into which publicly-available HIV surveillance data is assimilated using the Ensemble Kalman Inversion (EKI) algorithm. This procedure allows us to rigorously reconstruct the age-dependent mortality trends for PWH over the last several decades. To project future trends, we introduce and analyze a novel variant of the Dynamic Mode Decomposition (DMD), nonnegative DMD. We show that nonnegative DMD provides physically-consistent projections of mortality and HIV diagnosis while remaining purely data-driven, and not requiring additional assumptions. We then combine these elements to provide forecasts for future trends in PWDH mortality and demographic evolution in the coming years.

arXiv.org

Machine learning algorithms to predict stroke in China based on causal inference of time series analysis arxiv.org/abs/2503.14512

Machine learning algorithms to predict stroke in China based on causal inference of time series analysis

Participants: This study employed a combination of Vector Autoregression (VAR) model and Graph Neural Networks (GNN) to systematically construct dynamic causal inference. Multiple classic classification algorithms were compared, including Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting, and Multi Layer Perceptron (MLP). The SMOTE algorithm was used to undersample a small number of samples and employed Stratified K-fold Cross Validation. Results: This study included a total of 11,789 participants, including 6,334 females (53.73%) and 5,455 males (46.27%), with an average age of 65 years. Introduction of dynamic causal inference features has significantly improved the performance of almost all models. The area under the ROC curve of each model ranged from 0.78 to 0.83, indicating significant difference (P < 0.01). Among all the models, the Gradient Boosting model demonstrated the highest performance and stability. Model explanation and feature importance analysis generated model interpretation that illustrated significant contributors associated with risks of stroke. Conclusions and Relevance: This study proposes a stroke risk prediction method that combines dynamic causal inference with machine learning models, significantly improving prediction accuracy and revealing key health factors that affect stroke. The research results indicate that dynamic causal inference features have important value in predicting stroke risk, especially in capturing the impact of changes in health status over time on stroke risk. By further optimizing the model and introducing more variables, this study provides theoretical basis and practical guidance for future stroke prevention and intervention strategies.

arXiv.org

Longitudinal Impact of Tobacco Use and Social Determinants on Respiratory Health Disparities Among Louisiana Medicaid Enrollees arxiv.org/abs/2503.14528

Longitudinal Impact of Tobacco Use and Social Determinants on Respiratory Health Disparities Among Louisiana Medicaid Enrollees

Tobacco use remains a leading preventable contributor to serious health conditions in the United States, notably chronic obstructive pulmonary disease (COPD) and severe COVID-19 complications. Within Louisiana's Medicaid population, tobacco use prevalence is particularly high compared to privately insured groups, yet its full impact on long-term outcomes is not fully understood. This study aimed to investigate how tobacco use, in conjunction with demographic and clinical risk factors, influences the incidence of COPD and COVID-19 among Medicaid enrollees over time. We analyzed Louisiana Department of Health data from January 2020 to February 2023. Chi-square tests were conducted to provide descriptive statistics, and multivariate logistic regression models were applied across three discrete waves to assess both cross-sectional and longitudinal associations between risk factors and disease outcomes. Enrollees without baseline diagnoses of COPD or COVID-19 were followed to determine new-onset cases in subsequent waves. Adjusted odds ratios (AOR) were calculated after controlling for socio-demographic variables, comorbidities, and healthcare utilization patterns. Tobacco use emerged as a significant independent predictor of both COPD (Adjusted Odd Ratio= 1.12) and COVID-19 (Adjusted Odd Ratio = 1.66). Additional risk factors -- such as older age, gender, region, and pre-existing health conditions -- also showed significant associations with higher incidence rates of COPD and COVID-19. By linking tobacco use, demographic disparities, and comorbidities to an increased risk of COPD and COVID-19, this study underscores the urgent need for targeted tobacco cessation efforts and prevention strategies within this underserved population.

arXiv.org

Efficient Data Selection for Training Genomic Perturbation Models arxiv.org/abs/2503.14571

Efficient Data Selection for Training Genomic Perturbation Models

Genomic studies, including CRISPR-based PerturbSeq analyses, face a vast hypothesis space, while gene perturbations remain costly and time-consuming. Gene expression models based on graph neural networks are trained to predict the outcomes of gene perturbations to facilitate such experiments. Active learning methods are often employed to train these models due to the cost of the genomic experiments required to build the training set. However, poor model initialization in active learning can result in suboptimal early selections, wasting time and valuable resources. While typical active learning mitigates this issue over many iterations, the limited number of experimental cycles in genomic studies exacerbates the risk. To this end, we propose graph-based one-shot data selection methods for training gene expression models. Unlike active learning, one-shot data selection predefines the gene perturbations before training, hence removing the initialization bias. The data selection is motivated by theoretical studies of graph neural network generalization. The criteria are defined over the input graph and are optimized with submodular maximization. We compare them empirically to baselines and active learning methods that are state-of-the-art on this problem. The results demonstrate that graph-based one-shot data selection achieves comparable accuracy while alleviating the aforementioned risks.

arXiv.org

Sequence Analysis Using the Bezier Curve arxiv.org/abs/2503.14574

Sequence Analysis Using the Bezier Curve

The analysis of sequences (e.g., protein, DNA, and SMILES string) is essential for disease diagnosis, biomaterial engineering, genetic engineering, and drug discovery domains. Conventional analytical methods focus on transforming sequences into numerical representations for applying machine learning/deep learning-based sequence characterization. However, their efficacy is constrained by the intrinsic nature of deep learning (DL) models, which tend to exhibit suboptimal performance when applied to tabular data. An alternative group of methodologies endeavors to convert biological sequences into image forms by applying the concept of Chaos Game Representation (CGR). However, a noteworthy drawback of these methods lies in their tendency to map individual elements of the sequence onto a relatively small subset of designated pixels within the generated image. The resulting sparse image representation may not adequately encapsulate the comprehensive sequence information, potentially resulting in suboptimal predictions. In this study, we introduce a novel approach to transform sequences into images using the Bézier curve concept for element mapping. Mapping the elements onto a curve enhances the sequence information representation in the respective images, hence yielding better DL-based classification performance. We employed different sequence datasets to validate our system by using different classification tasks, and the results illustrate that our Bézier curve method is able to achieve good performance for all the tasks.

arXiv.org

Core-Periphery Principle Guided State Space Model for Functional Connectome Classification arxiv.org/abs/2503.14655

Core-Periphery Principle Guided State Space Model for Functional Connectome Classification

Understanding the organization of human brain networks has become a central focus in neuroscience, particularly in the study of functional connectivity, which plays a crucial role in diagnosing neurological disorders. Advances in functional magnetic resonance imaging and machine learning techniques have significantly improved brain network analysis. However, traditional machine learning approaches struggle to capture the complex relationships between brain regions, while deep learning methods, particularly Transformer-based models, face computational challenges due to their quadratic complexity in long-sequence modeling. To address these limitations, we propose a Core-Periphery State-Space Model (CP-SSM), an innovative framework for functional connectome classification. Specifically, we introduce Mamba, a selective state-space model with linear complexity, to effectively capture long-range dependencies in functional brain networks. Furthermore, inspired by the core-periphery (CP) organization, a fundamental characteristic of brain networks that enhances efficient information transmission, we design CP-MoE, a CP-guided Mixture-of-Experts that improves the representation learning of brain connectivity patterns. We evaluate CP-SSM on two benchmark fMRI datasets: ABIDE and ADNI. Experimental results demonstrate that CP-SSM surpasses Transformer-based models in classification performance while significantly reducing computational complexity. These findings highlight the effectiveness and efficiency of CP-SSM in modeling brain functional connectivity, offering a promising direction for neuroimaging-based neurological disease diagnosis.

arXiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.