arXiv Computer Science @arxiv_cs@qoto.org

1.12K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.12K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

On Symmetric Pseudo-Boolean Functions: Factorization, Kernels and Applications. (arXiv:2209.15009v1 [math.CO]) http://arxiv.org/abs/2209.15009

On Symmetric Pseudo-Boolean Functions: Factorization, Kernels and Applications

A symmetric pseudo-Boolean function is a map from Boolean tuples to real numbers which is invariant under input variable interchange. We prove that any such function can be equivalently expressed as a power series or factorized. The kernel of a pseudo-Boolean function is the set of all inputs that cause the function to vanish identically. Any $n$-variable symmetric pseudo-Boolean function $f(x_1, x_2, \dots, x_n)$ has a kernel corresponding to at least one $n$-affine hyperplane, each hyperplane is given by a constraint $\sum_{l=1}^n x_l = λ$ for $λ\in \mathbb{C}$ constant. We use these results to analyze symmetric pseudo-Boolean functions appearing in the literature of spin glass energy functions (Ising models), quantum information and tensor networks.

arXiv Computer Science @arxiv_cs@qoto.org

A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations. (arXiv:2209.15010v1 [cs.LG]) http://arxiv.org/abs/2209.15010

A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations

Recent work on Path-Dependent Partial Differential Equations (PPDEs) has shown that PPDE solutions can be approximated by a probabilistic representation, implemented in the literature by the estimation of conditional expectations using regression. However, a limitation of this approach is to require the selection of a basis in a function space. In this paper, we overcome this limitation by the use of deep learning methods, and we show that this setting allows for the derivation of error bounds on the approximation of conditional expectations. Numerical examples based on a two-person zero-sum game, as well as on Asian and barrier option pricing, are presented. In comparison with other deep learning approaches, our algorithm appears to be more accurate, especially in large dimensions.

arXiv Computer Science @arxiv_cs@qoto.org

Does Collaborative Editing Help Mitigate Security Vulnerabilities in Crowd-Shared IoT Code Examples?. (arXiv:2209.15011v1 [cs.SE]) http://arxiv.org/abs/2209.15011

Does Collaborative Editing Help Mitigate Security Vulnerabilities in Crowd-Shared IoT Code Examples?

Background: With the proliferation of crowd-sourced developer forums, software developers are increasingly sharing more coding solutions to programming problems with others in forums. The decentralized nature of knowledge sharing on sites has raised the concern of sharing security vulnerable code, which then can be reused into mission critical software systems - making those systems vulnerable in the process. Collaborative editing has been introduced in forums like Stack Overflow to improve the quality of the shared contents. Aim: In this paper, we investigate whether code editing can mitigate shared vulnerable code examples by analyzing IoT code snippets and their revisions in three Stack Exchange sites: Stack Overflow, Arduino, and Raspberry Pi. Method:We analyze the vulnerabilities present in shared IoT C/C++ code snippets, as C/C++ is one of the most widely used languages in mission-critical devices and low-powered IoT devices. We further analyse the revisions made to these code snippets, and their effects. Results: We find several vulnerabilities such as CWE 788 - Access of Memory Location After End of Buffer, in 740 code snippets . However, we find the vast majority of posts are not revised, or revisions are not made to the code snippets themselves (598 out of 740). We also find that revisions are most likely to result in no change to the number of vulnerabilities in a code snippet rather than deteriorating or improving the snippet. Conclusions: We conclude that the current collaborative editing system in the forums may be insufficient to help mitigate vulnerabilities in the shared code.

arXiv Computer Science @arxiv_cs@qoto.org

Multimodal analogs to infer humanities visualization requirements. (arXiv:2209.15029v1 [cs.HC]) http://arxiv.org/abs/2209.15029

Multimodal analogs to infer humanities visualization requirements

Gaps and requirements for multi-modal interfaces for humanities can be explored by observing the configuration of real-world environments and the tasks of visitors within them compared to digital environments. Examples include stores, museums, galleries, and stages with tasks similar to visualization tasks such as overview, zoom and detail; multi-dimensional reduction; collaboration; and comparison; with real-world environments offering much richer interactions. Some of these capabilities exist with the technology and visualization research, but not routinely available in implementations.

arXiv Computer Science @arxiv_cs@qoto.org

Automatic Data Augmentation via Invariance-Constrained Learning. (arXiv:2209.15031v1 [cs.LG]) http://arxiv.org/abs/2209.15031

Automatic Data Augmentation via Invariance-Constrained Learning

Underlying data structures, such as symmetries or invariances to transformations, are often exploited to improve the solution of learning tasks. However, embedding these properties in models or learning algorithms can be challenging and computationally intensive. Data augmentation, on the other hand, induces these symmetries during training by applying multiple transformations to the input data. Despite its ubiquity, its effectiveness depends on the choices of which transformations to apply, when to do so, and how often. In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task. To do so, it formulates data augmentation as an invariance-constrained learning problem and leverages Monte Carlo Markov Chain (MCMC) sampling to solve it. The result is a practical algorithm that not only does away with a priori searches for augmentation distributions, but also dynamically controls if and when data augmentation is applied. Our experiments illustrate the performance of this method, which achieves state-of-the-art results in automatic data augmentation benchmarks for CIFAR datasets. Furthermore, this approach can be used to gather insights on the actual symmetries underlying a learning task.

arXiv Computer Science @arxiv_cs@qoto.org

Guided Unsupervised Learning by Subaperture Decomposition for Ocean SAR Image Retrieval. (arXiv:2209.15034v1 [cs.CV]) http://arxiv.org/abs/2209.15034

Guided Unsupervised Learning by Subaperture Decomposition for Ocean SAR Image Retrieval

Spaceborne synthetic aperture radar (SAR) can provide accurate images of the ocean surface roughness day-or-night in nearly all weather conditions, being an unique asset for many geophysical applications. Considering the huge amount of data daily acquired by satellites, automated techniques for physical features extraction are needed. Even if supervised deep learning methods attain state-of-the-art results, they require great amount of labeled data, which are difficult and excessively expensive to acquire for ocean SAR imagery. To this end, we use the subaperture decomposition (SD) algorithm to enhance the unsupervised learning retrieval on the ocean surface, empowering ocean researchers to search into large ocean databases. We empirically prove that SD improve the retrieval precision with over 20% for an unsupervised transformer auto-encoder network. Moreover, we show that SD brings important performance boost when Doppler centroid images are used as input data, leading the way to new unsupervised physics guided retrieval algorithms.

arXiv Computer Science @arxiv_cs@qoto.org

Double negation stable h-propositions in cubical sets. (arXiv:2209.15035v1 [math.LO]) http://arxiv.org/abs/2209.15035

Double negation stable h-propositions in cubical sets

We give a construction of classifiers for double negation stable h-propositions in a variety of cubical set models of homotopy type theory and cubical type theory. This is used to give some relative consistency results: classifiers for double negation stable propositions exist in cubical sets whenever they exist in the metatheory; the Dedekind real numbers can be added to homotopy type theory without changing the consistency strength; we construct a model of homotopy type theory with extended Church's thesis, which states that all partial functions with double negation stable domain are computable.

arXiv Computer Science @arxiv_cs@qoto.org

Large-Scale Spatial Cross-Calibration of Hinode/SOT-SP and SDO/HMI. (arXiv:2209.15036v1 [astro-ph.SR]) http://arxiv.org/abs/2209.15036

Large-Scale Spatial Cross-Calibration of Hinode/SOT-SP and SDO/HMI

We investigate the cross-calibration of the Hinode/SOT-SP and SDO/HMI instrument meta-data, specifically the correspondence of the scaling and pointing information. Accurate calibration of these datasets gives the correspondence needed by inter-instrument studies and learning-based magnetogram systems, and is required for physically-meaningful photospheric magnetic field vectors. We approach the problem by robustly fitting geometric models on correspondences between images from each instrument's pipeline. This technique is common in computer vision, but several critical details are required when using scanning slit spectrograph data like Hinode/SOT-SP. We apply this technique to data spanning a decade of the Hinode mission. Our results suggest corrections to the published Level 2 Hinode/SOT-SP data. First, an analysis on approximately 2,700 scans suggests that the reported pixel size in Hinode/SOT-SP Level 2 data is incorrect by around 1%. Second, analysis of over 12,000 scans show that the pointing information is often incorrect by dozens of arcseconds with a strong bias. Regression of these corrections indicates that thermal effects have caused secular and cyclic drift in Hinode/SOT-SP pointing data over its mission. We offer two solutions. First, direct co-alignment with SDO/HMI data via our procedure can improve alignments for many Hinode/SOT-SP scans. Second, since the pointing errors are predictable, simple post-hoc corrections can substantially improve the pointing. We conclude by illustrating the impact of this updated calibration on derived physical data products needed for research and interpretation. Among other things, our results suggest that the pointing errors induce a hemispheric bias in estimates of radial current density.

arXiv Computer Science @arxiv_cs@qoto.org

Wafer-Scale Fast Fourier Transforms. (arXiv:2209.15040v1 [cs.DC]) http://arxiv.org/abs/2209.15040

Wafer-Scale Fast Fourier Transforms

We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000 processing elements (PEs) with fast local memory and equally fast nearest-neighbor interconnections. Our wafer-scale FFT (wsFFT) parallelizes a $n^3$ problem with up to $n^2$ PEs. At this point a PE processes only a single vector of the 3D domain (known as a pencil) per superstep, where each of the three supersteps performs FFT along one of the three axes of the input array. Between supersteps, wsFFT redistributes (transposes) the data to bring all elements of each one-dimensional pencil being transformed into the memory of a single PE. Each redistribution causes an all-to-all communication along one of the mesh dimensions. Given the level of parallelism, the size of the messages transmitted between pairs of PEs can be as small as a single word. In theory, a mesh is not ideal for all-to-all communication due to its limited bisection bandwidth. However, the mesh interconnecting PEs on the WSE lies entirely on-wafer and achieves nearly peak bandwidth even with tiny messages. This high efficiency on fine-grain communication allow wsFFT to achieve unprecedented levels of parallelism and performance. We analyse in detail computation and communication time, as well as the weak and strong scaling, using both FP16 and FP32 precision. With 32-bit arithmetic on the CS-2, we achieve 959 microseconds for 3D FFT of a $512^3$ complex input array using a 512x512 subgrid of the on-wafer PEs. This is the largest ever parallelization for this problem size and the first implementation that breaks the millisecond barrier.

arXiv Computer Science @arxiv_cs@qoto.org

Summarizing text to embed qualitative data into visualizations. (arXiv:2209.15041v1 [cs.HC]) http://arxiv.org/abs/2209.15041

Summarizing text to embed qualitative data into visualizations

Qualitative data can be conveyed with strings of text. Fitting longer text into visualizations requires a) space to place the text inside the visualization; and b) appropriate text to fit the space available. For quantitative visualizations, space is available in area marks; or within visualization layouts where the marks have an implied space (e.g. bar charts). For qualitative visualizations, space is defined in common text layouts such as prose paragraphs. To fit text within these layouts is a function for emerging NLP capabilities such as summarization.

arXiv Computer Science @arxiv_cs@qoto.org

Using Multivariate Linear Regression for Biochemical Oxygen Demand Prediction in Waste Water. (arXiv:2209.14297v1 [q-bio.OT]) http://arxiv.org/abs/2209.14297

Using Multivariate Linear Regression for Biochemical Oxygen Demand Prediction in Waste Water

There exist opportunities for Multivariate Linear Regression (MLR) in the prediction of Biochemical Oxygen Demand (BOD) in waste water, using the diverse water quality parameters as the input variables. The goal of this work is to examine the capability of MLR in prediction of BOD in waste water through four input variables: Dissolved Oxygen (DO), Nitrogen, Fecal Coliform and Total Coliform. The four input variables have higher correlation strength to BOD out of the seven parameters examined for the strength of correlation. Machine Learning (ML) was done with both 80% and 90% of the data as the training set and 20% and 10% as the test set respectively. MLR performance was evaluated through the coefficient of correlation (r), Root Mean Square Error (RMSE) and the percentage accuracy in prediction of BOD. The performance indices for the input variables of Dissolved Oxygen, Nitrogen, Fecal Coliform and Total Coliform in prediction of BOD are: RMSE=6.77mg/L, r=0.60 and accuracy 70.3% for training dataset of 80% and RMSE=6.74mg/L, r=0.60 and accuracy of 87.5% for training set of 90% of the dataset. It was found that increasing the percentage of the training set above 80% of the dataset improved the accuracy of the model only but did not have a significant impact on the prediction capacity of the model. The results showed that MLR model could be successfully employed in the estimation of BOD in waste water using appropriately selected input parameters.

arXiv Computer Science @arxiv_cs@qoto.org

Software Defect Prediction Using Support Vector Machine. (arXiv:2209.14299v1 [cs.SE]) http://arxiv.org/abs/2209.14299

Software Defect Prediction Using Support Vector Machine

Software defect prediction is an essential task during the software development Lifecycle as it can help managers to identify the most defect-proneness modules. Thus, it can reduce the test cost and assign testing resources efficiently. Many classification methods can be used to determine if the software is defective or not. Support Vector Machine (SVM) has not been used extensively for such problems because of its instability when applied on different datasets and parameter settings. The main parameter that influences the accuracy is the choice of the kernel function. The use of kernel functions has not been studied thoroughly in previous papers. Therefore, this research examines the performance and accuracy of SVM with six different kernel functions. Various public datasets from the PROMISE project empirically validate our hypothesis. The results demonstrate that no kernel function can give stable performance across different experimental settings. In addition, the use of PCA as a feature reduction algorithm shows slight accuracy improvement over some datasets.

arXiv Computer Science @arxiv_cs@qoto.org

Locally Weighted Regression with different Kernel Smoothers for Software Effort Estimation. (arXiv:2209.14300v1 [cs.SE]) http://arxiv.org/abs/2209.14300

Locally Weighted Regression with different Kernel Smoothers for Software Effort Estimation

Estimating software effort has been a largely unsolved problem for decades. One of the main reasons that hinders building accurate estimation models is the often heterogeneous nature of software data with a complex structure. Typically, building effort estimation models from local data tends to be more accurate than using the entire data. Previous studies have focused on the use of clustering techniques and decision trees to generate local and coherent data that can help in building local prediction models. However, these approaches may fall short in some aspect due to limitations in finding optimal clusters and processing noisy data. In this paper we used a more sophisticated locality approach that can mitigate these shortcomings that is Locally Weighted Regression (LWR). This method provides an efficient solution to learn from local data by building an estimation model that combines multiple local regression models in k-nearest-neighbor based model. The main factor affecting the accuracy of this method is the choice of the kernel function used to derive the weights for local regression models. This paper investigates the effects of choosing different kernels on the performance of Locally Weighted Regression of a software effort estimation problem. After comprehensive experiments with 7 datasets, 10 kernels, 3 polynomial degrees and 4 bandwidth values with a total of 840 Locally Weighted Regression variants, we found that: 1) Uniform kernel functions cannot outperform non-uniform kernel functions, and 2) kernel type, polynomial degrees and bandwidth parameters have no specific effect on the estimation accuracy.

arXiv Computer Science @arxiv_cs@qoto.org

Scalably learning quantum many-body Hamiltonians from dynamical data. (arXiv:2209.14328v1 [quant-ph]) http://arxiv.org/abs/2209.14328

Scalably learning quantum many-body Hamiltonians from dynamical data

The physics of a closed quantum mechanical system is governed by its Hamiltonian. However, in most practical situations, this Hamiltonian is not precisely known, and ultimately all there is are data obtained from measurements on the system. In this work, we introduce a highly scalable, data-driven approach to learning families of interacting many-body Hamiltonians from dynamical data, by bringing together techniques from gradient-based optimization from machine learning with efficient quantum state representations in terms of tensor networks. Our approach is highly practical, experimentally friendly, and intrinsically scalable to allow for system sizes of above 100 spins. In particular, we demonstrate on synthetic data that the algorithm works even if one is restricted to one simple initial state, a small number of single-qubit observables, and time evolution up to relatively short times. For the concrete example of the one-dimensional Heisenberg model our algorithm exhibits an error constant in the system size and scaling as the inverse square root of the size of the data set.

arXiv Computer Science @arxiv_cs@qoto.org

Text Independent Speaker Identification System for Access Control. (arXiv:2209.14335v1 [eess.AS]) http://arxiv.org/abs/2209.14335

Text Independent Speaker Identification System for Access Control

Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual. Machine intelligence is trying to mimic humans in speaker identification problems through various approaches to speech feature extraction and speech modeling techniques. This paper presents a text-independent speaker identification system that employs Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor (kNN) for classification. The maximum cross-validation accuracy obtained was 60%. This will be improved upon in subsequent research.

arXiv Computer Science @arxiv_cs@qoto.org

Who is GPT-3? An Exploration of Personality, Values and Demographics. (arXiv:2209.14338v1 [cs.CL]) http://arxiv.org/abs/2209.14338

Who is GPT-3? An Exploration of Personality, Values and Demographics

Language models such as GPT-3 have caused a furore in the research community. Some studies found that GPT-3 has some creative abilities and makes mistakes that are on par with human behaviour. This paper answers a related question: who is GPT-3? We administered two validated measurement tools to GPT-3 to assess its personality, the values it holds and its self-reported demographics. Our results show that GPT-3 scores similarly to human samples in terms of personality and - when provided with a model response memory - in terms of the values it holds. We provide the first evidence of psychological assessment of the GPT-3 model and thereby add to our understanding of the GPT-3 model. We close with suggestions for future research that moves social science closer to language models and vice versa.

arXiv Computer Science @arxiv_cs@qoto.org

Using Processing Fluency as a Metric of Trust in Scatterplot Visualizations. (arXiv:2209.14340v1 [cs.HC]) http://arxiv.org/abs/2209.14340

Using Processing Fluency as a Metric of Trust in Scatterplot Visualizations

Establishing trust with readers is an important first step in visual data communication. But what makes a visualization trustworthy? Psychology and behavioral economics research has found processing fluency (i.e., speed and accuracy of perceiving and processing a stimulus) is central to perceived trust. We examine the association between processing fluency and trust in visualizations through two empirical studies. In Experiment 1, we tested the effect of camouflaging a visualization on processing fluency. Participants estimated the proportion of data values within a specified range for six camouflaged visualizations and one non-camouflaged control; they also reported their perceived difficulty for each of the visualizations. Camouflaged visualizations produced less accurate estimations compared to the control. In Experiment 2, we created a decision task based on trust games adapted from behavioral economics. We asked participants to invest money in two hypothetical companies and report how much they trust each company. One company communicates its strategy with a camouflaged visualization, the other with a controlled visualization. Participants tended to invest less money in the company presenting a camouflaged visualization. Hence, we found support for the hypothesis that processing fluency is key to the perception of trust in visual data communication.

arXiv Computer Science @arxiv_cs@qoto.org

The Change You Want to See. (arXiv:2209.14341v1 [cs.CV]) http://arxiv.org/abs/2209.14341

The Change You Want to See

We live in a dynamic world where things change all the time. Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains. In this paper, we tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination. To this end, we make the following four contributions: (i) we propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks; (ii) we introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions; (iii) we contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion; (iv) we evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization.

arXiv Computer Science @arxiv_cs@qoto.org

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning. (arXiv:2209.14344v1 [cs.LG]) http://arxiv.org/abs/2209.14344

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Equilibrium selection in multi-agent games refers to the problem of selecting a Pareto-optimal equilibrium. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto-AC (PAC), an actor-critic algorithm that utilises a simple principle of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate PAC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose a graph neural network extension which is shown to efficiently scale in games with up to 15 agents.

arXiv Computer Science @arxiv_cs@qoto.org

Audio Barlow Twins: Self-Supervised Audio Representation Learning. (arXiv:2209.14345v1 [cs.SD]) http://arxiv.org/abs/2209.14345

Audio Barlow Twins: Self-Supervised Audio Representation Learning

The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021 Challenge, achieving results which outperform, or otherwise are on a par with, the current state-of-the-art for instance discrimination self-supervised learning approaches to audio representation learning. Code at https://github.com/jonahanton/SSL_audio.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018