Tim's Arxiv FrontPage Generated on 2024-04-26. This frontpage is generated by scraping new papers on Arxiv and using an embedding model to find papers matching topics I'm interested in. Currently, the false positive rate is fairly high. The repo is here. Forked and customized from this project
	Artificial General Intelligence
2024-04-25	CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations A central question for cognitive science is to understand how humans process visual objects, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli.Generating visual stimuli with controlling concepts is the key.However, there are currently no generative models in AI to solve this problem.Here, we present the Concept based Controllable Generation (CoCoG) framework.CoCoG consists of two components, a simple yet efficient AI agent for extracting interpretable concept and predicting human decision-making in visual similarity judgment tasks, and a conditional generation model for generating visual stimuli given the concepts. 0.83We quantify the performance of CoCoG from two aspects, the human behavior prediction accuracy and the controllable generation ability.The experiments with CoCoG indicate that 1) the reliable concept embeddings in CoCoG allows to predict human behavior with 64.07\% accuracy in the THINGS-similarity dataset; 2) CoCoG can generate diverse objects through the control of concepts; 3) CoCoG can manipulate human similarity judgment behavior by intervening key concepts.CoCoG offers visual objects with controlling concepts to advance our understanding of causality in human cognition.The code of CoCoG is available at \url{https://github.com/ncclab-sustech/CoCoG}. link
2024-04-25	T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. 0.833Modern learning models, while powerful, often exhibit a level of complexity that renders them opaque black boxes, resulting in a notable lack of transparency that hinders our ability to decipher their decision-making processes.Opacity challenges the interpretability and practical application of machine learning, especially in critical domains where understanding the underlying reasons is essential for informed decision-making.Explainable Artificial Intelligence (XAI) rises to meet that challenge, unraveling the complexity of black boxes by providing elucidating explanations. 0.849Among the various XAI approaches, feature attribution/importance XAI stands out for its capacity to delineate the significance of input features in the prediction process.However, most existing attribution methods have limitations, such as instability, when divergent explanations may result from similar or even the same instance.In this work, we introduce T-Explainer, a novel local additive attribution explainer based on Taylor expansion endowed with desirable properties, such as local accuracy and consistency, while stable over multiple runs.We demonstrate T-Explainer's effectiveness through benchmark experiments with well-known attribution methods.In addition, T-Explainer is developed as a comprehensive XAI framework comprising quantitative metrics to assess and visualize attribution explanations. link
2024-04-25	Understanding Privacy Risks of Embeddings Induced by Large Language Models Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. 0.834One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.The significant advantage of LLMs over traditional pre-trained models may exacerbate these concerns.To this end, we investigate the effectiveness of reconstructing original knowledge and predicting entity attributes from these embeddings when LLMs are employed.Empirical findings indicate that LLMs significantly improve the accuracy of two evaluated tasks over those from pre-trained models, regardless of whether the texts are in-distribution or out-of-distribution.This underscores a heightened potential for LLMs to jeopardize user privacy, highlighting the negative consequences of their widespread use.We further discuss preliminary strategies to mitigate this risk. link
2024-04-25	Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents In the rapidly evolving field of artificial intelligence, ensuring safe decision-making of Large Language Models (LLMs) is a significant challenge. 0.836This paper introduces Governance of the Commons Simulation (GovSim), a simulation platform designed to study strategic interactions and cooperative decision-making in LLMs.Through this simulation environment, we explore the dynamics of resource sharing among AI agents, highlighting the importance of ethical considerations, strategic planning, and negotiation skills.GovSim is versatile and supports any text-based agent, including LLMs agents.Using the Generative Agent framework, we create a standard agent that facilitates the integration of different LLMs.Our findings reveal that within GovSim, only two out of 15 tested LLMs managed to achieve a sustainable outcome, indicating a significant gap in the ability of models to manage shared resources.Furthermore, we find that by removing the ability of agents to communicate, they overuse the shared resource, highlighting the importance of communication for cooperation.Interestingly, most LLMs lack the ability to make universalized hypotheses, which highlights a significant weakness in their reasoning skills.We open source the full suite of our research results, including the simulation environment, agent prompts, and a comprehensive web interface. link
2024-04-25	RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). 0.828A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities.In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations.All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality.We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field. link
	Complex Systems
2024-04-25	T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets.Modern learning models, while powerful, often exhibit a level of complexity that renders them opaque black boxes, resulting in a notable lack of transparency that hinders our ability to decipher their decision-making processes. 0.835Opacity challenges the interpretability and practical application of machine learning, especially in critical domains where understanding the underlying reasons is essential for informed decision-making.Explainable Artificial Intelligence (XAI) rises to meet that challenge, unraveling the complexity of black boxes by providing elucidating explanations.Among the various XAI approaches, feature attribution/importance XAI stands out for its capacity to delineate the significance of input features in the prediction process.However, most existing attribution methods have limitations, such as instability, when divergent explanations may result from similar or even the same instance.In this work, we introduce T-Explainer, a novel local additive attribution explainer based on Taylor expansion endowed with desirable properties, such as local accuracy and consistency, while stable over multiple runs.We demonstrate T-Explainer's effectiveness through benchmark experiments with well-known attribution methods.In addition, T-Explainer is developed as a comprehensive XAI framework comprising quantitative metrics to assess and visualize attribution explanations. link
	Dissipative Adaptation
2024-04-25	Rapid thermalization of dissipative many-body dynamics of commuting Hamiltonians Quantum systems typically reach thermal equilibrium rather quickly when coupled to a thermal environment.The usual way of bounding the speed of this process is by estimating the spectral gap of the dissipative generator.However the gap, by itself, does not always yield a reasonable estimate for the thermalization time in many-body systems: without further structure, a uniform lower bound on it only constrains the thermalization time to grow polynomially with system size. Here, instead, we show that for a large class of geometrically-2-local models of Davies generators with commuting Hamiltonians, the thermalization time is much shorter than one would na\"ively estimate from the gap: at most logarithmic in the system size.This yields the so-called rapid mixing of dissipative dynamics. 0.829The result is particularly relevant for 1D systems, for which we prove rapid thermalization with a system size independent decay rate only from a positive gap in the generator.We also prove that systems in hypercubic lattices of any dimension, and exponential graphs, such as trees, have rapid mixing at high enough temperatures.We do this by introducing a novel notion of clustering which we call "strong local indistinguishability" based on a max-relative entropy, and then proving that it implies a lower bound on the modified logarithmic Sobolev inequality (MLSI) for nearest neighbour commuting models. This has consequences for the rate of thermalization towards Gibbs states, and also for their relevant Wasserstein distances and transportation cost inequalities.Along the way, we show that several measures of decay of correlations on Gibbs states of commuting Hamiltonians are equivalent, a result of independent interest.At the technical level, we also show a direct relation between properties of Davies and Schmidt dynamics, that allows to transfer results of thermalization between both. link
	Reinforcement Learning
2024-04-25	Offline Reinforcement Learning with Behavioral Supervisor Tuning Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. 0.857Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome.Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains.In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support.TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning. link
2024-04-25	A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. 0.824While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints.In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. 0.836The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled.Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions.From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. 0.822The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments.The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints. link
2024-04-25	T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. 0.825Modern learning models, while powerful, often exhibit a level of complexity that renders them opaque black boxes, resulting in a notable lack of transparency that hinders our ability to decipher their decision-making processes.Opacity challenges the interpretability and practical application of machine learning, especially in critical domains where understanding the underlying reasons is essential for informed decision-making.Explainable Artificial Intelligence (XAI) rises to meet that challenge, unraveling the complexity of black boxes by providing elucidating explanations.Among the various XAI approaches, feature attribution/importance XAI stands out for its capacity to delineate the significance of input features in the prediction process.However, most existing attribution methods have limitations, such as instability, when divergent explanations may result from similar or even the same instance.In this work, we introduce T-Explainer, a novel local additive attribution explainer based on Taylor expansion endowed with desirable properties, such as local accuracy and consistency, while stable over multiple runs.We demonstrate T-Explainer's effectiveness through benchmark experiments with well-known attribution methods.In addition, T-Explainer is developed as a comprehensive XAI framework comprising quantitative metrics to assess and visualize attribution explanations. link
2024-04-25	Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points.The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. 0.845Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information.Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency.The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points. link
2024-04-25	REBEL: Reinforcement Learning via Regressing Relative Rewards While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models. 0.83Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping) and is notorious for its sensitivity to the precise implementation of these components.In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like.We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative rewards via a direct policy parameterization between two completions to a prompt, enabling strikingly lightweight implementation.In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature.REBEL can also cleanly incorporate offline data and handle the intransitive preferences we frequently see in practice.Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally tractable than PPO. link
2024-04-25	DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. 0.827In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner.By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given.The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering.Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. 0.828The learned rewards even achieve comparable performance to human-engineered rewards on some tasks.See our project page (https://sites.google.com/view/iclr24drs) for more details. link
	Trajectory Optimization
2024-04-25	A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies.While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints.In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods.The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled.Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions.From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. 0.822The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments.The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints. link
2024-04-25	SHINE: Social Homology Identification for Navigation in Crowded Environments Navigating mobile robots in social environments remains a challenging task due to the intricacies of human-robot interactions.Most of the motion planners designed for crowded and dynamic environments focus on choosing the best velocity to reach the goal while avoiding collisions, but do not explicitly consider the high-level navigation behavior (avoiding through the left or right side, letting others pass or passing before others, etc.). 0.826In this work, we present a novel motion planner that incorporates topology distinct paths representing diverse navigation strategies around humans. 0.822The planner selects the topology class that imitates human behavior the best using a deep neural network model trained on real-world human motion data, ensuring socially intelligent and contextually aware navigation.Our system refines the chosen path through an optimization-based local planner in real time, ensuring seamless adherence to desired social behaviors.In this way, we decouple perception and local planning from the decision-making process.We evaluate the prediction accuracy of the network with real-world data.In addition, we assess the navigation capabilities in both simulation and a real-world platform, comparing it with other state-of-the-art planners.We demonstrate that our planner exhibits socially desirable behaviors and shows a smooth and remarkable performance. link
2024-04-25	Successive Convexification for Trajectory Optimization with Continuous-Time Constraint Satisfaction We present successive convexification, a real-time-capable solution method for nonconvex trajectory optimization, with continuous-time constraint satisfaction and guaranteed convergence, that only requires first-order information. 0.846The proposed framework combines several key methods to solve a large class of nonlinear optimal control problems: (i) exterior penalty-based reformulation of the path constraints; (ii) generalized time-dilation; (iii) multiple-shooting discretization; (iv) $\ell_1$ exact penalization of the nonconvex constraints; and (v) the prox-linear method, a sequential convex programming (SCP) algorithm for convex-composite minimization.The reformulation of the path constraints enables continuous-time constraint satisfaction even on sparse discretization grids and obviates the need for mesh refinement heuristics.Through the prox-linear method, we guarantee convergence of the solution method to stationary points of the penalized problem and guarantee that the converged solutions that are feasible with respect to the discretized and control-parameterized optimal control problem are also Karush-Kuhn-Tucker (KKT) points.Furthermore, we highlight the specialization of this property to global minimizers of convex optimal control problems, wherein the reformulated path constraints cannot be represented by canonical cones, i.e., in the form required by existing convex optimization solvers.In addition to theoretical analysis, we demonstrate the effectiveness and real-time capability of the proposed framework with numerical examples based on popular optimal control applications: dynamic obstacle avoidance and rocket landing. 0.82 link

Tim's Arxiv FrontPage

Artificial General Intelligence

Complex Systems

Dissipative Adaptation

Reinforcement Learning

Trajectory Optimization