Show newer

Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning. (arXiv:2209.06208v1 [cs.LG]) arxiv.org/abs/2209.06208

Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning

In operating Rooms (ORs), activities are usually different from other typical working environments. In particular, surgeons are frequently exposed to multiple psycho-organizational constraints that may cause negative repercussions on their health and performance. This is commonly attributed to an increase in the associated Cognitive Workload (CWL) that results from dealing with unexpected and repetitive tasks, as well as large amounts of information and potentially risky cognitive overload. In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL in a number of four different surgical tasks. First, a model based on the concept of transfer learning is used to identify if a surgeon is experiencing any CWL. Secondly, a Convolutional Neural Network (CNN) uses this information to identify different types of CWL associated to each surgical task. The suggested multimodal approach consider adjacent signals from electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS) and pupil eye diameter. The concatenation of signals allows complex correlations in terms of time (temporal) and channel location (spatial). Data collection is performed by a Multi-sensing AI Environment for Surgical Task $\&$ Role Optimisation platform (MAESTRO) developed at HARMS Lab. To compare the performance of the proposed methodology, a number of state-of-art machine learning techniques have been implemented. The tests show that the proposed model has a precision of 93%.

arxiv.org

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold. (arXiv:2209.06209v1 [cs.CV]) arxiv.org/abs/2209.06209

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf{CDCP dilemma}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C$^{3}$M) for text-based person retrieval. The core idea of our method, just as a Chinese saying goes, is to `\textit{san si er hou xing}', namely, to \textbf{Look Before yoU Leap (LBUL)}. The common manifold mapping mechanism of LBUL contains a looking step and a leaping step. Compared to CDCP-based methods, LBUL considers distribution characteristics of both the visual and textual modalities before embedding data from one certain modality into C$^{3}$M to achieve a more solid cross-modal distribution consensus, and hence achieve a superior retrieval accuracy. We evaluate our proposed method on two text-based person retrieval datasets CUHK-PEDES and RSTPReid. Experimental results demonstrate that the proposed LBUL outperforms previous methods and achieves the state-of-the-art performance.

arxiv.org

Scheduling Algorithms for Federated Learning with Minimal Energy Consumption. (arXiv:2209.06210v1 [cs.LG]) arxiv.org/abs/2209.06210

Scheduling Algorithms for Federated Learning with Minimal Energy Consumption

Federated Learning (FL) has opened the opportunity for collaboratively training machine learning models on heterogeneous mobile or Edge devices while keeping local data private.With an increase in its adoption, a growing concern is related to its economic and environmental cost (as is also the case for other machine learning techniques).Unfortunately, little work has been done to optimize its energy consumption or emissions of carbon dioxide or equivalents, as energy minimization is usually left as a secondary objective.In this paper, we investigate the problem of minimizing the energy consumption of FL training on heterogeneous devices by controlling the workload distribution.We model this as the Minimal Cost FL Schedule problem, a total cost minimization problem with identical, independent, and atomic tasks that have to be assigned to heterogeneous resources with arbitrary cost functions.We propose a pseudo-polynomial optimal solution to the problem based on the previously unexplored Multiple-Choice Minimum-Cost Maximal Knapsack Packing Problem.We also provide four algorithms for scenarios where cost functions are monotonically increasing and follow the same behavior.These solutions are likewise applicable on the minimization of other kinds of costs, and in other one-dimensional data partition problems.

arxiv.org

Quantifying the Online Long-Term Interest in Research. (arXiv:2209.06212v1 [cs.DL]) arxiv.org/abs/2209.06212

Quantifying the Online Long-Term Interest in Research

Research articles are being shared in increasing numbers on multiple online platforms. Although the scholarly impact of these articles has been widely studied, the online interest determined by how long the research articles are shared online remains unclear. Being cognizant of how long a research article is mentioned online could be valuable information to the researchers. In this paper, we analyzed multiple social media platforms on which users share and/or discuss scholarly articles. We built three clusters for papers, based on the number of yearly online mentions having publication dates ranging from the year 1920 to 2016. Using the online social media metrics for each of these three clusters, we built machine learning models to predict the long-term online interest in research articles. We addressed the prediction task with two different approaches: regression and classification. For the regression approach, the Multi-Layer Perceptron model performed best, and for the classification approach, the tree-based models performed better than other models. We found that old articles are most evident in the contexts of economics and industry (i.e., patents). In contrast, recently published articles are most evident in research platforms (i.e., Mendeley) followed by social media platforms (i.e., Twitter).

arxiv.org

Leveraging Language Foundation Models for Human Mobility Forecasting. (arXiv:2209.05479v1 [cs.LG]) arxiv.org/abs/2209.05479

Leveraging Language Foundation Models for Human Mobility Forecasting

In this paper, we propose a novel pipeline that leverages language foundation models for temporal sequential pattern mining, such as for human mobility forecasting tasks. For example, in the task of predicting Place-of-Interest (POI) customer flows, typically the number of visits is extracted from historical logs, and only the numerical data are used to predict visitor flows. In this research, we perform the forecasting task directly on the natural language input that includes all kinds of information such as numerical values and contextual semantic information. Specific prompts are introduced to transform numerical temporal sequences into sentences so that existing language models can be directly applied. We design an AuxMobLCast pipeline for predicting the number of visitors in each POI, integrating an auxiliary POI category classification task with the encoder-decoder architecture. This research provides empirical evidence of the effectiveness of the proposed AuxMobLCast pipeline to discover sequential patterns in mobility forecasting tasks. The results, evaluated on three real-world datasets, demonstrate that pre-trained language foundation models also have good performance in forecasting temporal sequences. This study could provide visionary insights and lead to new research directions for predicting human mobility.

arxiv.org

Systems-theoretic Hazard Analysis of Digital Human-System Interface Relevant to Reactor Trip. (arXiv:2209.05480v1 [cs.HC]) arxiv.org/abs/2209.05480

Systems-theoretic Hazard Analysis of Digital Human-System Interface Relevant to Reactor Trip

Human-system interface is one of the key advanced design features applied to modern digital instrumentation and control systems of nuclear power plants. The conventional design is based on a compact workstation-based system within the control room. The compact workstation provides both a strategic operating environment while also a convenient display for plant status information necessary to the operator. The control environment is further enhanced through display panels, visual and auditory alarms, and procedure systems. However, just like the legacy control, the HSI should incorporate diversity to demonstrate sufficient defense-in-depth protection against common cause failures of the safety system. Furthermore, the vulnerability of the HSI is affected by a plethora of factors, such as human error, cyberattacks, software common cause failures, etc., that complicate the design and analysis. Therefore, this work aims to identify and evaluate existing system vulnerabilities to support the licensing, deployment and operation of HSI designs, especially the functions that are relevant to a reactor trip. We performed a systematic hazard analysis to investigate potential vulnerabilities within the HSI design using the novel redundancy-guided systems-theoretic hazard analysis. This method was developed and demonstrated by Idaho National Laboratory under a project initiated by the Risk-Informed Systems Analysis Pathway of the U.S. Department of Energy's Light Water Reactor Sustainability Program. The goal of the project is to develop a strong technical basis for risk assessment strategies to support effective, reliable, and licensable digital instrumentation and control technologies.

arxiv.org

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language. (arXiv:2209.05481v1 [cs.LG]) arxiv.org/abs/2209.05481

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality. Since the hierarchy of molecular knowledge is profound, even humans learn from different modalities including both intuitive diagrams and professional texts to assist their understanding. Inspired by this, we propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data (crawled from published Scientific Citation Index papers) via contrastive learning. This AI model represents a critical attempt that directly bridges molecular graphs and natural language. Importantly, through capturing the specific and complementary information of the two modalities, our proposed model can better grasp molecular expertise. Experimental results show that our model not only exhibits promising performance in cross-modal tasks such as cross-modal retrieval and molecule caption, but also enhances molecular property prediction and possesses capability to generate meaningful molecular graphs from natural language descriptions. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine, among others.

arxiv.org

Self-Supervised Coordinate Projection Network for Sparse-View Computed Tomography. (arXiv:2209.05483v1 [eess.IV]) arxiv.org/abs/2209.05483

Self-Supervised Coordinate Projection Network for Sparse-View Computed Tomography

In the present work, we propose a Self-supervised COordinate Projection nEtwork (SCOPE) to reconstruct the artifacts-free CT image from a single SV sinogram by solving the inverse tomography imaging problem. Compared with recent related works that solve similar problems using implicit neural representation network (INR), our essential contribution is an effective and simple re-projection strategy that pushes the tomography image reconstruction quality over supervised deep learning CT reconstruction works. The proposed strategy is inspired by the simple relationship between linear algebra and inverse problems. To solve the under-determined linear equation system, we first introduce INR to constrain the solution space via image continuity prior and achieve a rough solution. And secondly, we propose to generate a dense view sinogram that improves the rank of the linear equation system and produces a more stable CT image solution space. Our experiment results demonstrate that the re-projection strategy significantly improves the image reconstruction quality (+3 dB for PSNR at least). Besides, we integrate the recent hash encoding into our SCOPE model, which greatly accelerates the model training. Finally, we evaluate SCOPE in parallel and fan X-ray beam SVCT reconstruction tasks. Experimental results indicate that the proposed SCOPE model outperforms two latest INR-based methods and two well-popular supervised DL methods quantitatively and qualitatively.

arxiv.org

CustOmics: A versatile deep-learning based strategy for multi-omics integration. (arXiv:2209.05485v1 [q-bio.GN]) arxiv.org/abs/2209.05485

CustOmics: A versatile deep-learning based strategy for multi-omics integration

Recent advances in high-throughput sequencing technologies have enabled the extraction of multiple features that depict patient samples at diverse and complementary molecular levels. The generation of such data has led to new challenges in computational biology regarding the integration of high-dimensional and heterogeneous datasets that capture the interrelationships between multiple genes and their functions. Thanks to their versatility and ability to learn synthetic latent representations of complex data, deep learning methods offer promising perspectives for integrating multi-omics data. These methods have led to the conception of many original architectures that are primarily based on autoencoder models. However, due to the difficulty of the task, the integration strategy is fundamental to take full advantage of the sources' particularities without losing the global trends. This paper presents a novel strategy to build a customizable autoencoder model that adapts to the dataset used in the case of high-dimensional multi-source integration. We will assess the impact of integration strategies on the latent representation and combine the best strategies to propose a new method, CustOmics (https://github.com/HakimBenkirane/CustOmics). We focus here on the integration of data from multiple omics sources and demonstrate the performance of the proposed method on test cases for several tasks such as classification and survival analysis.

arxiv.org

Active Learning and Approximate Model Calibration for Automated Visual Inspection in Manufacturing. (arXiv:2209.05486v1 [cs.LG]) arxiv.org/abs/2209.05486

Active Learning and Approximate Model Calibration for Automated Visual Inspection in Manufacturing

Quality control is a crucial activity performed by manufacturing enterprises to ensure that their products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled increasing digitalization of manufacturing. In addition, artificial intelligence enables higher degrees of automation, reducing overall costs and time required for defect inspection. This research compares three active learning approaches (with single and multiple oracles) to visual inspection. We propose a novel approach to probabilities calibration of classification models and two new metrics to assess the performance of the calibration without the need for ground truth. We performed experiments on real-world data provided by Philips Consumer Lifestyle BV. Our results show that explored active learning settings can reduce the data labeling effort by between three and four percent without detriment to the overall quality goals, considering a threshold of p=0.95. Furthermore, we show that the proposed metrics successfully capture relevant information otherwise available to metrics used up to date only through ground truth data. Therefore, the proposed metrics can be used to estimate the quality of models' probability calibration without committing to a labeling effort to obtain ground truth data.

arxiv.org
Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.