arXiv Computer Science @arxiv_cs@qoto.org

1.14K Followers

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018

2 Following 1.14K Followers

Posts Posts and replies Media

arXiv Computer Science @arxiv_cs@qoto.org

Deep Bidirectional Language-Knowledge Graph Pretraining. (arXiv:2210.09338v1 [cs.CL]) http://arxiv.org/abs/2210.09338

Deep Bidirectional Language-Knowledge Graph Pretraining

Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks. Recent works show that a knowledge graph (KG) can complement text data, offering structured background knowledge that provides a useful scaffold for reasoning. However, these works are not pretrained to learn a deep fusion of the two modalities at scale, limiting the potential to acquire fully joint representations of text and KG. Here we propose DRAGON (Deep Bidirectional Language-Knowledge Graph Pretraining), a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Specifically, our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities. We pretrain this model by unifying two self-supervised reasoning tasks, masked language modeling and KG link prediction. DRAGON outperforms existing LM and LM+KG models on diverse downstream tasks including question answering across general and biomedical domains, with +5% absolute gain on average. In particular, DRAGON achieves notable performance on complex reasoning about language and knowledge (+10% on questions involving long contexts or multi-step reasoning) and low-resource QA (+8% on OBQA and RiddleSense), and new state-of-the-art results on various BioNLP tasks. Our code and trained models are available at https://github.com/michiyasunaga/dragon.

arXiv Computer Science @arxiv_cs@qoto.org

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection. (arXiv:2210.09340v1 [cs.CL]) http://arxiv.org/abs/2210.09340

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

The concerning rise of hateful content on online platforms has increased the attention towards automatic hate speech detection, commonly formulated as a supervised classification task. State-of-the-art deep learning-based approaches usually require a substantial amount of labeled resources for training. However, annotating hate speech resources is expensive, time-consuming, and often harmful to the annotators. This creates a pressing need to transfer knowledge from the existing labeled resources to low-resource hate speech corpora with the goal of improving system performance. For this, neighborhood-based frameworks have been shown to be effective. However, they have limited flexibility. In our paper, we propose a novel training strategy that allows flexible modeling of the relative proximity of neighbors retrieved from a resource-rich corpus to learn the amount of transfer. In particular, we incorporate neighborhood information with Optimal Transport, which permits exploiting the geometry of the data embedding space. By aligning the joint embedding and label distributions of neighbors, we demonstrate substantial improvements over strong baselines, in low-resource scenarios, on different publicly available hate speech corpora.

arXiv Computer Science @arxiv_cs@qoto.org

Data-Driven Observability Decomposition with Koopman Operators for Optimization of Output Functions of Nonlinear Systems. (arXiv:2210.09343v1 [math.OC]) http://arxiv.org/abs/2210.09343

Data-Driven Observability Decomposition with Koopman Operators for Optimization of Output Functions of Nonlinear Systems

When complex systems with nonlinear dynamics achieve an output performance objective, only a fraction of the state dynamics significantly impacts that output. Those minimal state dynamics can be identified using the differential geometric approach to the observability of nonlinear systems, but the theory is limited to only analytical systems. In this paper, we extend the notion of nonlinear observable decomposition to the more general class of data-informed systems. We employ Koopman operator theory, which encapsulates nonlinear dynamics in linear models, allowing us to bridge the gap between linear and nonlinear observability notions. We propose a new algorithm to learn Koopman operator representations that capture the system dynamics while ensuring that the output performance measure is in the span of its observables. We show that a transformation of this linear, output-inclusive Koopman model renders a new minimum Koopman representation. This representation embodies only the observable portion of the nonlinear observable decomposition of the original system. A prime application of this theory is to identify genes in biological systems that correspond to specific phenotypes, the performance measure. We simulate two biological gene networks and demonstrate that the observability of Koopman operators can successfully identify genes that drive each phenotype. We anticipate our novel system identification tool will effectively discover reduced gene networks that drive complex behaviors in biological systems.

arXiv Computer Science @arxiv_cs@qoto.org

CrossRE: A Cross-Domain Dataset for Relation Extraction. (arXiv:2210.09345v1 [cs.CL]) http://arxiv.org/abs/2210.09345

CrossRE: A Cross-Domain Dataset for Relation Extraction

Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups. Little is known on how well a RE system fares in challenging, but realistic out-of-distribution evaluation setups. To address this gap, we propose CrossRE, a new, freely-available cross-domain benchmark for RE, which comprises six distinct text domains and includes multi-label annotations. An additional innovation is that we release meta-data collected during annotation, to include explanations and flags of difficult instances. We provide an empirical evaluation with a state-of-the-art model for relation classification. As the meta-data enables us to shed new light on the state-of-the-art model, we provide a comprehensive analysis on the impact of difficult cases and find correlations between model and human annotations. Overall, our empirical investigation highlights the difficulty of cross-domain RE. We release our dataset, to spur more research in this direction.

arXiv Computer Science @arxiv_cs@qoto.org

Cloth Funnels: Canonicalized-Alignment for Multi-Purpose Garment Manipulation. (arXiv:2210.09347v1 [cs.RO]) http://arxiv.org/abs/2210.09347

Cloth Funnels: Canonicalized-Alignment for Multi-Purpose Garment Manipulation

Automating garment manipulation is challenging due to extremely high variability in object configurations. To reduce this intrinsic variation, we introduce the task of "canonicalized-alignment" that simplifies downstream applications by reducing the possible garment configurations. This task can be considered as "cloth state funnel" that manipulates arbitrarily configured clothing items into a predefined deformable configuration (i.e. canonicalization) at an appropriate rigid pose (i.e. alignment). In the end, the cloth items will result in a compact set of structured and highly visible configurations - which are desirable for downstream manipulation skills. To enable this task, we propose a novel canonicalized-alignment objective that effectively guides learning to avoid adverse local minima during learning. Using this objective, we learn a multi-arm, multi-primitive policy that strategically chooses between dynamic flings and quasi-static pick and place actions to achieve efficient canonicalized-alignment. We evaluate this approach on a real-world ironing and folding system that relies on this learned policy as the common first step. Empirically, we demonstrate that our task-agnostic canonicalized-alignment can enable even simple manually-designed policies to work well where they were previously inadequate, thus bridging the gap between automated non-deformable manufacturing and deformable manipulation. Code and qualitative visualizations are available at https://clothfunnels.cs.columbia.edu/. Video can be found at https://www.youtube.com/watch?v=TkUn0b7mbj0.

arXiv Computer Science @arxiv_cs@qoto.org

Hierarchical Decentralized Deep Reinforcement Learning Architecture for a Simulated Four-Legged Agent. (arXiv:2210.08003v1 [cs.AI]) http://arxiv.org/abs/2210.08003

Hierarchical Decentralized Deep Reinforcement Learning Architecture for a Simulated Four-Legged Agent

Legged locomotion is widespread in nature and has inspired the design of current robots. The controller of these legged robots is often realized as one centralized instance. However, in nature, control of movement happens in a hierarchical and decentralized fashion. Introducing these biological design principles into robotic control systems has motivated this work. We tackle the question whether decentralized and hierarchical control is beneficial for legged robots and present a novel decentral, hierarchical architecture to control a simulated legged agent. Three different tasks varying in complexity are designed to benchmark five architectures (centralized, decentralized, hierarchical and two different combinations of hierarchical decentralized architectures). The results demonstrate that decentralizing the different levels of the hierarchical architectures facilitates learning of the agent, ensures more energy efficient movements as well as robustness towards new unseen environments. Furthermore, this comparison sheds light on the importance of modularity in hierarchical architectures to solve complex goal-directed tasks. We provide an open-source code implementation of our architecture (https://github.com/wzaielamri/hddrl).

arXiv Computer Science @arxiv_cs@qoto.org

Misaligned orientations of 4f optical neural network for image classification accuracy on various datasets. (arXiv:2210.08004v1 [physics.optics]) http://arxiv.org/abs/2210.08004

Misaligned orientations of 4f optical neural network for image classification accuracy on various datasets

In recent years, the optical 4f system has drawn much attention in building high-speed and ultra-low-power optical neural networks (ONNs). Most optical systems suffer from the misalignment of the optical devices during installment. The performance of ONN based on the optical 4f system (4f-ONN) is considered sensitive to the misalignment in the optical path introduced. In order to comprehensively investigate the influence caused by the misalignment, we proposed a method for estimating the performance of a 4f-ONN in response to various misalignment in the context of the image classification task.The misalignment in numerical simulation is estimated by manipulating the optical intensity distributions in the fourth focus plane in the 4f system. Followed by a series of physical experiments to validate the simulation results. Using our method to test the impact of misalignment of 4f system on the classification accuracy of two popular image classification datasets, MNIST and Quickdraw16. On both datasets, we found that the performances of 4f-ONN generally degraded dramatically as the positioning error increased. Different positioning error tolerance in the misalignment orientations was observed over the two datasets. Classification performance could be preserved by positioning errors up to 200 microns in a specific direction.

arXiv Computer Science @arxiv_cs@qoto.org

A MIP-Based Approach for Multi-Robot Geometric Task-and-Motion Planning. (arXiv:2210.08005v1 [cs.RO]) http://arxiv.org/abs/2210.08005

A MIP-Based Approach for Multi-Robot Geometric Task-and-Motion Planning

We address multi-robot geometric task-and-motion planning (MR-GTAMP) problems in synchronous, monotone setups. The goal of the MR-GTAMP problem is to move objects with multiple robots to goal regions in the presence of other movable objects. To perform the tasks successfully and effectively, the robots have to adopt intelligent collaboration strategies, i.e., decide which robot should move which objects to which positions, and perform collaborative actions, such as handovers. To endow robots with these collaboration capabilities, we propose to first collect occlusion and reachability information for each robot as well as information about whether two robots can perform a handover action by calling motion-planning algorithms. We then propose a method that uses the collected information to build a graph structure which captures the precedence of the manipulations of different objects and supports the implementation of a mixed-integer program to guide the search for highly effective collaborative task-and-motion plans. The search process for collaborative task-and-motion plans is based on a Monte-Carlo Tree Search (MCTS) exploration strategy to achieve exploration-exploitation balance. We evaluate our framework in two challenging GTAMP domains and show that it can generate high-quality task-and-motion plans with respect to the planning time, the resulting plan length and the number of objects moved compared to two state-of-the-art baselines.

arXiv Computer Science @arxiv_cs@qoto.org

Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period. (arXiv:2210.08006v1 [cs.DC]) http://arxiv.org/abs/2210.08006

Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period

Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have been occurring in cloud infrastructure of popular providers, causing outages of several hours and the unrecoverable loss of user data. There are prior studies about cloud failure analyses are limited and use sources such as news articles. However, a detailed cloud failure focused study is required that provides analyses for cloud failure data gathered directly from the vendors. Furthermore, the Covid-19 cloud failures should be studied as cloud services played a major role throughout the Covid-19 period, as individuals relied on cloud services for activities such as working from home. A program can be made for this task. As a result, we will be able to better understand and mitigate cloud failures to reduce the effect of cloud failures.

arXiv Computer Science @arxiv_cs@qoto.org

Knowledge acquisition via interactive Distributed Cognitive skill Modules. (arXiv:2210.08007v1 [cs.AI]) http://arxiv.org/abs/2210.08007

Knowledge acquisition via interactive Distributed Cognitive skill Modules

The human's cognitive capacity for problem solving is always limited to his/her educational background, skills, experiences, etc. Hence, it is often insufficient to bring solution to extraordinary problems especially when there is a time restriction. Nowadays this sort of personal cognitive limitations are overcome at some extend by the computational utilities (e.g. program packages, internet, etc.) where each one provides a specific background skill to the individual to solve a particular problem. Nevertheless these models are all based on already available conventional tools or knowledge and unable to solve spontaneous unique problems, except human's procedural cognitive skills. But unfortunately such low-level skills can not be modelled and stored in a conventional way like classical models and knowledge. This work aims to introduce an early stage of a modular approach to procedural skill acquisition and storage via distributed cognitive skill modules which provide unique opportunity to extend the limits of its exploitation.

arXiv Computer Science @arxiv_cs@qoto.org

Inductive Logical Query Answering in Knowledge Graphs. (arXiv:2210.08008v1 [cs.AI]) http://arxiv.org/abs/2210.08008

Inductive Logical Query Answering in Knowledge Graphs

Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining the entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs). Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones. Exploring the efficiency--effectiveness trade-off, we find the inductive relational structure representation method generally achieves higher performance, while the inductive node representation method is able to answer complex queries in the inference-only regime without any training on queries and scales to graphs of millions of nodes. Code is available at https://github.com/DeepGraphLearning/InductiveQE.

arXiv Computer Science @arxiv_cs@qoto.org

Trajectory Prediction for Vehicle Conflict Identification at Intersections Using Sequence-to-Sequence Recurrent Neural Networks. (arXiv:2210.08009v1 [cs.AI]) http://arxiv.org/abs/2210.08009

Trajectory Prediction for Vehicle Conflict Identification at Intersections Using Sequence-to-Sequence Recurrent Neural Networks

Surrogate safety measures in the form of conflict indicators are indispensable components of the proactive traffic safety toolbox. Conflict indicators can be classified into past-trajectory-based conflicts and predicted-trajectory-based conflicts. While the calculation of the former class of conflicts is deterministic and unambiguous, the latter category is computed using predicted vehicle trajectories and is thus more stochastic. Consequently, the accuracy of prediction-based conflicts is contingent on the accuracy of the utilized trajectory prediction algorithm. Trajectory prediction can be a challenging task, particularly at intersections where vehicle maneuvers are diverse. Furthermore, due to limitations relating to the road user trajectory extraction pipelines, accurate geometric representation of vehicles during conflict analysis is a challenging task. Misrepresented geometries distort the real distances between vehicles under observation. In this research, a prediction-based conflict identification methodology was proposed. A sequence-to-sequence Recurrent Neural Network was developed to sequentially predict future vehicle trajectories for up to 3 seconds ahead. Furthermore, the proposed network was trained using the CitySim Dataset to forecast both future vehicle positions and headings to facilitate the prediction of future bounding boxes, thus maintaining accurate vehicle geometric representations. It was experimentally determined that the proposed method outperformed frequently used trajectory prediction models for conflict analysis at intersections. A comparison between Time-to-Collision (TTC) conflict identification using vehicle bounding boxes versus the commonly used vehicle center points for geometric representation was conducted. Compared to the bounding box method, the center point approach often failed to identify TTC conflicts or underestimated their severity.

arXiv Computer Science @arxiv_cs@qoto.org

Autoencoder based Anomaly Detection and Explained Fault Localization in Industrial Cooling Systems. (arXiv:2210.08011v1 [cs.LG]) http://arxiv.org/abs/2210.08011

Autoencoder based Anomaly Detection and Explained Fault Localization in Industrial Cooling Systems

Anomaly detection in large industrial cooling systems is very challenging due to the high data dimensionality, inconsistent sensor recordings, and lack of labels. The state of the art for automated anomaly detection in these systems typically relies on expert knowledge and thresholds. However, data is viewed isolated and complex, multivariate relationships are neglected. In this work, we present an autoencoder based end-to-end workflow for anomaly detection suitable for multivariate time series data in large industrial cooling systems, including explained fault localization and root cause analysis based on expert knowledge. We identify system failures using a threshold on the total reconstruction error (autoencoder reconstruction error including all sensor signals). For fault localization, we compute the individual reconstruction error (autoencoder reconstruction error for each sensor signal) allowing us to identify the signals that contribute most to the total reconstruction error. Expert knowledge is provided via look-up table enabling root-cause analysis and assignment to the affected subsystem. We demonstrated our findings in a cooling system unit including 34 sensors over a 8-months time period using 4-fold cross validation approaches and automatically created labels based on thresholds provided by domain experts. Using 4-fold cross validation, we reached a F1-score of 0.56, whereas the autoencoder results showed a higher consistency score (CS of 0.92) compared to the automatically created labels (CS of 0.62) -- indicating that the anomaly is recognized in a very stable manner. The main anomaly was found by the autoencoder and automatically created labels and was also recorded in the log files. Further, the explained fault localization highlighted the most affected component for the main anomaly in a very consistent manner.

arXiv Computer Science @arxiv_cs@qoto.org

A geospatial bounded confidence model including mega-influencers with an application to Covid-19 vaccine hesitancy. (arXiv:2210.08012v1 [cs.SI]) http://arxiv.org/abs/2210.08012

A geospatial bounded confidence model including mega-influencers with an application to Covid-19 vaccine hesitancy

We introduce a geospatial bounded confidence model with mega-influencers, inspired by Hegselmann and Krause. The inclusion of geography gives rise to large-scale geospatial patterns evolving out of random initial data; that is, spatial clusters of like-minded agents emerge regardless of initialization. Mega-influencers and stochasticity amplify this effect, and soften local consensus. As an application, we consider national views on Covid-19 vaccines. For a certain set of parameters, our model yields results comparable to real survey results on vaccine hesitancy from late 2020.

arXiv Computer Science @arxiv_cs@qoto.org

On the Relationship Between Variational Inference and Auto-Associative Memory. (arXiv:2210.08013v1 [cs.LG]) http://arxiv.org/abs/2210.08013

On the Relationship Between Variational Inference and Auto-Associative Memory

In this article, we propose a variational inference formulation of auto-associative memories, allowing us to combine perceptual inference and memory retrieval into the same mathematical framework. In this formulation, the prior probability distribution onto latent representations is made memory dependent, thus pulling the inference process towards previously stored representations. We then study how different neural network approaches to variational inference can be applied in this framework. We compare methods relying on amortized inference such as Variational Auto Encoders and methods relying on iterative inference such as Predictive Coding and suggest combining both approaches to design new auto-associative memory models. We evaluate the obtained algorithms on the CIFAR10 and CLEVR image datasets and compare them with other associative memory models such as Hopfield Networks, End-to-End Memory Networks and Neural Turing Machines.

arXiv Computer Science @arxiv_cs@qoto.org

Autoencoder-Aided Visualization of Collections of Morse Complexes. (arXiv:2210.07245v1 [cs.HC]) http://arxiv.org/abs/2210.07245

Autoencoder-Aided Visualization of Collections of Morse Complexes

Though analyzing a single scalar field using Morse complexes is well studied, there are few techniques for visualizing a collection of Morse complexes. We focus on analyses that are enabled by looking at a Morse complex as an embedded domain decomposition. Specifically, we target 2D scalar fields, and we encode the Morse complex through binary images of the boundaries of decomposition. Then we use image-based autoencoders to create a feature space for the Morse complexes. We apply additional dimensionality reduction methods to construct a scatterplot as a visual interface of the feature space. This allows us to investigate individual Morse complexes, as they relate to the collection, through interaction with the scatterplot. We demonstrate our approach using a synthetic data set, microscopy images, and time-varying vorticity magnitude fields of flow. Through these, we show that our method can produce insights about structures within the collection of Morse complexes.

arXiv Computer Science @arxiv_cs@qoto.org

Topics in Deep Learning and Optimization Algorithms for IoT Applications in Smart Transportation. (arXiv:2210.07246v1 [cs.LG]) http://arxiv.org/abs/2210.07246

Topics in Deep Learning and Optimization Algorithms for IoT Applications in Smart Transportation

Nowadays, the Internet of Things (IoT) has become one of the most important technologies which enables a variety of connected and intelligent applications in smart cities. The smart decision making process of IoT devices not only relies on the large volume of data collected from their sensors, but also depends on advanced optimization theories and novel machine learning technologies which can process and analyse the collected data in specific network structure. Therefore, it becomes practically important to investigate how different optimization algorithms and machine learning techniques can be leveraged to improve system performance. As one of the most important vertical domains for IoT applications, smart transportation system has played a key role for providing real-world information and services to citizens by making their access to transport facilities easier and thus it is one of the key application areas to be explored in this thesis. In a nutshell, this thesis covers three key topics related to applying mathematical optimization and deep learning methods to IoT networks. In the first topic, we propose an optimal transmission frequency management scheme using decentralized ADMM-based method in a IoT network and introduce a mechanism to identify anomalies in data transmission frequency using an LSTM-based architecture. In the second topic, we leverage graph neural network (GNN) for demand prediction for shared bikes. In particular, we introduce a novel architecture, i.e., attention-based spatial temporal graph convolutional network (AST-GCN), to improve the prediction accuracy in real world datasets. In the last topic, we consider a highway traffic network scenario where frequent lane changing behaviors may occur with probability. A specific GNN based anomaly detector is devised to reveal such a probability driven by data collected in a dedicated mobility simulator.

arXiv Computer Science @arxiv_cs@qoto.org

Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays. (arXiv:2210.07259v1 [cs.NI]) http://arxiv.org/abs/2210.07259

Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays

Cloud applications are increasingly distributing data across multiple regions and cloud providers. Unfortunately, wide-area bulk data transfers are often slow, bottlenecking applications. We demonstrate that it is possible to significantly improve inter-region cloud bulk transfer throughput by adapting network overlays to the cloud setting -- that is, by routing data through indirect paths at the application layer. However, directly applying network overlays in this setting can result in unacceptable increases in cloud egress prices. We present Skyplane, a system for bulk data transfer between cloud object stores that uses cloud-aware network overlays to optimally navigate the trade-off between price and performance. Skyplane's planner uses mixed-integer linear programming to determine the optimal overlay path and resource allocation for data transfer, subject to user-provided constraints on price or performance. Skyplane outperforms public cloud transfer services by up to $4.6\times$ for transfers within one cloud and by up to $5.0\times$ across clouds.

arXiv Computer Science @arxiv_cs@qoto.org

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models. (arXiv:2210.07269v1 [cs.CL]) http://arxiv.org/abs/2210.07269

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models

A common limitation of diagnostic tests for detecting social biases in NLP models is that they may only detect stereotypic associations that are pre-specified by the designer of the test. Since enumerating all possible problematic associations is infeasible, it is likely these tests fail to detect biases that are present in a model but not pre-specified by the designer. To address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers about PeOPle) in social commonsense question-answering. Our pipeline generates modified instances from the Social IQa dataset (Sap et al., 2019) by (1) substituting names associated with different demographic groups, and (2) generating many distractor answers from a masked language model. By using a social commonsense model to score the generated distractors, we are able to uncover the model's stereotypic associations between demographic groups and an open set of words. We also test SODAPOP on debiased models and show the limitations of multiple state-of-the-art debiasing algorithms.

arXiv Computer Science @arxiv_cs@qoto.org

Multi-Task Learning for Joint Semantic Role and Proto-Role Labeling. (arXiv:2210.07270v1 [cs.CL]) http://arxiv.org/abs/2210.07270

Multi-Task Learning for Joint Semantic Role and Proto-Role Labeling

We put forward an end-to-end multi-step machine learning model which jointly labels semantic roles and the proto-roles of Dowty (1991), given a sentence and the predicates therein. Our best architecture first learns argument spans followed by learning the argument's syntactic heads. This information is shared with the next steps for predicting the semantic roles and proto-roles. We also experiment with transfer learning from argument and head prediction to role and proto-role labeling. We compare using static and contextual embeddings for words, arguments, and sentences. Unlike previous work, our model does not require pre-training or fine-tuning on additional tasks, beyond using off-the-shelf (static or contextual) embeddings and supervision. It also does not require argument spans, their semantic roles, and/or their gold syntactic heads as additional input, because it learns to predict all these during training. Our multi-task learning model raises the state-of-the-art predictions for most proto-roles.

Bot

I toot the arXiv feed for topics in Computer Science.

#ComputerScience #CS #Programming #SoftwareEngineering #Software #SoftwareDevelopment #Computers #Science #arXiv #News #PeerReview

Joined Jul 2018